Word boundary detection using landmarks : a survey of consonants

Chi, Xuemin, 1979-

Author(s)

Chi, Xuemin, 1979-

DownloadFull printable version (20.22Mb)

Other Contributors

Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.

Advisor

Kenneth Noble Stevens.

Terms of use

M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582

Metadata

Show full item record

Abstract

This project searches for consistent acoustic attributes in a broad set of American English consonants that would help in identifying their word positions in running speech. A database of sentences containing word pairs (e.g. "lay keys" vs. "lake ease" for /k/) of thirteen consonants (six stops, two affricates, three fricatives, and two nasals), controlled for prosodic boundaries, pitch accents, phonetic contexts, and word positions (initial vs. final), was recorded from six speakers. On the assumption that consonants might be articulated differently at word onsets, several temporal and spectral measurements were made and contrasted as a function of word position. The relatively simple measurement of duration did quite well in distinguishing word-initial (being longer) from word-final positions in our database. For stops and affricates at word onsets, speakers are found to lengthen closure and release durations differently, depending on voicing, suggesting that enhancement of paradigmatic contrast is made for these consonants. The identity of the following vowel (/i/ or /o/) had no consistent effect on the durations of the consonants. Word-initial consonants were found to be less variable than word-final ones, supporting the claim that word onsets are perceptual "islands of reliability" in the lexical access process. Durations of word-onset consonants were relatively constant within each sound class (voicing, stops, affricates, fricatives, nasals), independent of place of articulation. By using acoustic landmarks, from which information about manner as well as durations can be easily extracted, word segmentation and/or lexical access processes can start without the complete identification of all features (such as place) for a particular segment.

(cont.) Acoustic landmarks can thus be used either singly, in identifying acoustically interesting regions where place features can be identified, or in combinations, from which manner features (Park, 2008) and temporal elations can be derived, to drive higher-level processing (e.g. word segmentation and lexical access) of the speech signal.

Description

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.

Includes bibliographical references (p. 107-111).

Date issued

2008

URI

http://hdl.handle.net/1721.1/44402

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Keywords

Electrical Engineering and Computer Science.

Collections

Doctoral Theses