Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer

La, Chia-Hao, 1980-

dc.contributor.advisor	Timothy J. Hazen.	en_US
dc.contributor.author	La, Chia-Hao, 1980-	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2006-03-24T16:13:35Z
dc.date.available	2006-03-24T16:13:35Z
dc.date.copyright	2003	en_US
dc.date.issued	2003	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/29670
dc.description	Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003.	en_US
dc.description	Includes bibliographical references (p. 51-52).	en_US
dc.description.abstract	This thesis describes a method for augmenting an audio-only speech recognizer with visual lip-reading information, in order to improve the performance and robustness of the recognizer. The speech recognizer's variable length audio segments are resolved with the fixed length video frames using segment constrained Hidden Markov Modeling. A Viterbi search over the per-segment Hidden Markov Model resolves the variable asynchrony between the audio and video streams. The two streams are combined according to a relative weighting scheme, which is determined by optimizing on a held-out data set. Although a full audio-visual system has yet not been implemented, this thesis describes the infrastructure that has been developed to accommodate integration with a visual lip-reading module that will be completed in the near future.	en_US
dc.description.statementofresponsibility	by Chia-Hao La.	en_US
dc.format.extent	52 p.	en_US
dc.format.extent	1862592 bytes
dc.format.extent	1862400 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	application/pdf
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer	en_US
dc.type	Thesis	en_US
dc.description.degree	M.Eng.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	53833510	en_US

Files in this item

Name:: 53833510-MIT.pdf
Size:: 3.078Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record