| dc.contributor.advisor | Timothy J. Hazen. | en_US |
| dc.contributor.author | La, Chia-Hao, 1980- | en_US |
| dc.contributor.other | Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. | en_US |
| dc.date.accessioned | 2006-03-24T16:13:35Z | |
| dc.date.available | 2006-03-24T16:13:35Z | |
| dc.date.copyright | 2003 | en_US |
| dc.date.issued | 2003 | en_US |
| dc.identifier.uri | http://hdl.handle.net/1721.1/29670 | |
| dc.description | Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2003. | en_US |
| dc.description | Includes bibliographical references (p. 51-52). | en_US |
| dc.description.abstract | This thesis describes a method for augmenting an audio-only speech recognizer with visual lip-reading information, in order to improve the performance and robustness of the recognizer. The speech recognizer's variable length audio segments are resolved with the fixed length video frames using segment constrained Hidden Markov Modeling. A Viterbi search over the per-segment Hidden Markov Model resolves the variable asynchrony between the audio and video streams. The two streams are combined according to a relative weighting scheme, which is determined by optimizing on a held-out data set. Although a full audio-visual system has yet not been implemented, this thesis describes the infrastructure that has been developed to accommodate integration with a visual lip-reading module that will be completed in the near future. | en_US |
| dc.description.statementofresponsibility | by Chia-Hao La. | en_US |
| dc.format.extent | 52 p. | en_US |
| dc.format.extent | 1862592 bytes | |
| dc.format.extent | 1862400 bytes | |
| dc.format.mimetype | application/pdf | |
| dc.format.mimetype | application/pdf | |
| dc.language.iso | eng | en_US |
| dc.publisher | Massachusetts Institute of Technology | en_US |
| dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
| dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | |
| dc.subject | Electrical Engineering and Computer Science. | en_US |
| dc.title | Infrastructure development for integration of lip reading into the SUMMIT Speech Recognizer | en_US |
| dc.type | Thesis | en_US |
| dc.description.degree | M.Eng. | en_US |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| dc.identifier.oclc | 53833510 | en_US |