Word boundary detection using landmarks : a survey of consonants
Author(s)Chi, Xuemin, 1979-
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Kenneth Noble Stevens.
MetadataShow full item record
This project searches for consistent acoustic attributes in a broad set of American English consonants that would help in identifying their word positions in running speech. A database of sentences containing word pairs (e.g. "lay keys" vs. "lake ease" for /k/) of thirteen consonants (six stops, two affricates, three fricatives, and two nasals), controlled for prosodic boundaries, pitch accents, phonetic contexts, and word positions (initial vs. final), was recorded from six speakers. On the assumption that consonants might be articulated differently at word onsets, several temporal and spectral measurements were made and contrasted as a function of word position. The relatively simple measurement of duration did quite well in distinguishing word-initial (being longer) from word-final positions in our database. For stops and affricates at word onsets, speakers are found to lengthen closure and release durations differently, depending on voicing, suggesting that enhancement of paradigmatic contrast is made for these consonants. The identity of the following vowel (/i/ or /o/) had no consistent effect on the durations of the consonants. Word-initial consonants were found to be less variable than word-final ones, supporting the claim that word onsets are perceptual "islands of reliability" in the lexical access process. Durations of word-onset consonants were relatively constant within each sound class (voicing, stops, affricates, fricatives, nasals), independent of place of articulation. By using acoustic landmarks, from which information about manner as well as durations can be easily extracted, word segmentation and/or lexical access processes can start without the complete identification of all features (such as place) for a particular segment.(cont.) Acoustic landmarks can thus be used either singly, in identifying acoustically interesting regions where place features can be identified, or in combinations, from which manner features (Park, 2008) and temporal elations can be derived, to drive higher-level processing (e.g. word segmentation and lexical access) of the speech signal.
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.Includes bibliographical references (p. 107-111).
DepartmentMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Massachusetts Institute of Technology
Electrical Engineering and Computer Science.