This study examines the detection of the English glide /w/ in the Buckeye Corpus of conversational speech, with a focus on (w)V sequences. Neural network classifiers were developed using dynamic spectral cues alongside a compact set of contextual predictors. The results show that sampling F2 at both 20 percent and 50 percent of (w)V duration alone yields high detection accuracy with an F1-score around 0.86, outperforming onset-based or midpoint sampling. Adding contextual predictors (vowel identity, preceding consonant place and manner, and word-internal position), increases detection performance modestly, achieving an F1-score around 0.92. Two parsimonious models were compared: the first combines F2 at these landmarks with vowel identity, preceding consonant place and manner, and word-internal position, performing slightly better than the second model, which replaces manner with (w)V duration. Permutation-based feature importance analysis confirms that F2 at 20 percent and 50 percent is the most decisive acoustic predictor. Contextual features contribute supplementary improvements, particularly useful in resolving ambiguous acoustic cases, though their effects remain limited compared to the dominant role of temporal spectral cues.
1. Introduction
2. Method
3. Analyses and Discussion
4. Conclusion
References
(0)
(0)