Detection of consonant voicing : a module for a hierarchical speech recognition system

Choi, Jeung-Yoon, 1999-

dc.contributor.advisor	Kenneth N. Stevens.	en_US
dc.contributor.author	Choi, Jeung-Yoon, 1999-	en_US
dc.date.accessioned	2005-08-22T18:33:27Z
dc.date.available	2005-08-22T18:33:27Z
dc.date.copyright	1999	en_US
dc.date.issued	1999	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/9462
dc.description	Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.	en_US
dc.description	Includes bibliographical references (leaves 107-111).	en_US
dc.description.abstract	In this thesis, a method for designing a hierarchical speech recognition system at the phonetic level is presented. The system employs various component modules to detect acoustic cues in the signal. These acoustic cues are used to infer values of features that describe segments. Features are considered to be arranged in a hierarchical structure, where those describing the manner of production are placed at a higher level than features describing articulators and their configurations. The structure of the recognition system follows this feature hierarchy. As an example of designing a component in this system, a module for detecting consonant voicing is described in detail. Consonant production and conditions for phonation are first examined, to determine acoustic properties that may be used to infer consonant voicing. The acoustic measurements are then examined in different environments to determine a set of reliable acoustic cues. These acoustic cues include fundamental frequency, difference in amplitudes of the first two harmonics, cutoff first formant frequency, and residual amplitude of the first harmonic around consonant landmarks. Hand measurements of these acoustic cues results in error rates around 10% for isolated speech, and 20% for continuous speech. Combining closure/release landmarks reduces error rates by about 5%. Comparison with perceived voicing yield similar results. When modifications are discounted, most errors occur adjacent to weak vowels. Automatic measurements increase error rates by about 3%. Training on isolated utterances produces error rates for continuous speech comparable to training on continuous speech. These results show that a small set of acoustic cues based on speech production may provide reliable criteria for determining the values of features. The contexts in which errors occur correspond to those for human speech perception, and expressing acoustic information using features provides a compact method of describing these environments.	en_US
dc.description.statementofresponsibility	by Jeung-Yoon Choi.	en_US
dc.format.extent	111 leaves	en_US
dc.format.extent	9523291 bytes
dc.format.extent	9523049 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	application/pdf
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582
dc.subject	Electrical Engineering and Computer Science	en_US
dc.title	Detection of consonant voicing : a module for a hierarchical speech recognition system	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.oclc	43482506	en_US

Files in this item

Name:: 43482506-MIT.pdf
Size:: 9.081Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record