Show simple item record

dc.contributor.advisorKenneth N. Stevens.en_US
dc.contributor.authorPark, Chi-youn, 1981-en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2009-03-20T19:30:50Z
dc.date.available2009-03-20T19:30:50Z
dc.date.copyright2008en_US
dc.date.issued2008en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/44905
dc.descriptionThesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.en_US
dc.descriptionThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.en_US
dc.descriptionIncludes bibliographical references (p. 191-197).en_US
dc.description.abstractThis thesis focuses on the detection of abrupt acoustic discontinuities in the speech signal, which constitute landmarks for consonant sounds. Because a large amount of phonetic information is concentrated near acoustic discontinuities, more focused speech analysis and recognition can be performed based on the landmarks. Three types of consonant landmarks are defined according to its characteristics -- glottal vibration, turbulence noise, and sonorant consonant -- so that the appropriate analysis method for each landmark point can be determined. A probabilistic knowledge-based algorithm is developed in three steps. First, landmark candidates are detected and their landmark types are classified based on changes in spectral amplitude. Next, a bigram model describing the physiologically-feasible sequences of consonant landmarks is proposed, so that the most likely landmark sequence among the candidates can be found. Finally, it has been observed that certain landmarks are ambiguous in certain sets of phonetic and prosodic contexts, while they can be reliably detected in other contexts. A method to represent the regions where the landmarks are reliably detected versus where they are ambiguous is presented. On TIMIT test set, 91% of all the consonant landmarks and 95% of obstruent landmarks are located as landmark candidates. The bigram-based process for determining the most likely landmark sequences yields 12% deletion and substitution rates and a 15% insertion rate. An alternative representation that distinguishes reliable and ambiguous regions can detect 92% of the landmarks and 40% of the landmarks are judged to be reliable. The deletion rate within reliable regions is as low as 5%.en_US
dc.description.abstract(cont.) The resulting landmark sequences form a basis for a knowledge-based speech recognition system since the landmarks imply broad phonetic classes of the speech signal and indicate the points of focus for estimating detailed phonetic information. In addition, because the reliable regions generally correspond to lexical stresses and word boundaries, it is expected that the landmarks can guide the focus of attention not only at the phoneme-level, but at the phrase-level as well.en_US
dc.description.statementofresponsibilityby Chiyoun Park.en_US
dc.format.extent197 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleConsonant landmark detection for speech recognitionen_US
dc.typeThesisen_US
dc.description.degreePh.D.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc297548228en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record