Modelling out-of-vocabulary words for robust speech recognition

Bazzi, Issam

dc.contributor.advisor	James Glass.	en_US
dc.contributor.author	Bazzi, Issam	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2005-10-14T19:23:27Z
dc.date.available	2005-10-14T19:23:27Z
dc.date.copyright	2002	en_US
dc.date.issued	2002	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/29241
dc.description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.	en_US
dc.description	Includes bibliographical references (p. 147-153).	en_US
dc.description.abstract	This thesis concerns the problem of unknown or out-of-vocabulary (OOV) words in continuous speech recognition. Most of today's state-of-the-art speech recognition systems can recognize only words that belong to some predefined finite word vocabulary. When encountering an OOV word, a speech recognizer erroneously substitutes the OOV word with a similarly sounding word from its vocabulary. Furthermore, a recognition error due to an OOV word tends to spread errors into neighboring words; dramatically degrading overall recognition performance. In this thesis we propose a novel approach for handling OOV words within a single-stage recognition framework. To achieve this goal, an explicit and detailed model of OOV words is constructed and then used to augment the closed-vocabulary search space of a standard speech recognizer. This OOV model achieves open-vocabulary recognition through the use of more flexible subword units that can be concatenated during recognition to form new phone sequences corresponding to potential new words. Examples of such subword units are phones, syllables, or some automatically-learned multi-phone sequences. Subword units have the attractive property of being a closed set, and thus are able to cover any new words, and can conceivably cover most utterances with partially spoken words as well. The main challenge with such an approach is ensuring that the OOV model does not absorb portions of the speech signal corresponding to in-vocabulary (IV) words. In dealing with this challenge, we explore several research issues related to designing the subword lexicon, language model, and topology of the OOV model. We present a dictionary-based approach for estimating subword language models.	en_US
dc.description.abstract	(cont.) Such language models are utilized within the subword search space to help recognize the underlying phonetic transcription of OOV words. We also propose a data-driven iterative bottom-up procedure for automatically creating a multi-phone subword inventory. Starting with individual phones, this procedure uses the maximum mutual information principle to successively merge phones to obtain longer subword units. The thesis also extends this OOV approach to modelling multiple classes of OOV words. Instead of augmenting the word search space with a single model, we add several models, one for each class of words. We present two approaches for designing the OOV word classes. The first approach relies on using common part-of-speech tags. The second approach is a data-driven two-step clustering procedure, where the first step uses agglomerative clustering to derive an initial class assignment, while the second step uses iterative clustering to move words from one class to another in order to reduce the model perplexity. We present experiments on two recognition tasks: the medium-vocabulary spontaneous speech JUPITER weather information domain and the large-vocabulary broadcast news HUB4 domain. On the JUPITER task, the proposed approach can detect 70% of the OOV words with a false alarm rate of less than 3%. At this operating point, the word error rate (WER) on the IV utterances degrades slightly (from 10.9% to 11.2%) while the overall WER decreases from 17.1% to 16.4% ...	en_US
dc.description.statementofresponsibility	by Issam Bazzi.	en_US
dc.format.extent	153 p.	en_US
dc.format.extent	6234069 bytes
dc.format.extent	6233877 bytes
dc.format.mimetype	application/pdf
dc.format.mimetype	application/pdf
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Modelling out-of-vocabulary words for robust speech recognition	en_US
dc.title.alternative	Modelling OOV words for robust speech recognition	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	51549987	en_US

Files in this item

Name:: 51549987-MIT.pdf
Size:: 9.237Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record