MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Towards a unified framework for sub-lexical and supra-lexical linguistic modeling

Author(s)
Mou, Xiaolong, 1973-
Thumbnail
DownloadFull printable version (11.53Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Victor Zue.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Conversational interfaces have received much attention as a promising natural communication channel between humans and computers. A typical conversational interface consists of three major systems: speech understanding, dialog management and spoken language generation. In such a conversational interface, speech recognition as the front-end of speech understanding remains to be one of the fundamental challenges for establishing robust and effective human/computer communications. On the one hand, the speech recognition component in a conversational interface lives in a rich system environment. Diverse sources of knowledge are available and can potentially be beneficial to its robustness and accuracy. For example, the natural language understanding component can provide linguistic knowledge in syntax and semantics that helps constrain the recognition search space. On the other hand, the speech recognition component also faces the challenge of spontaneous speech, and it is important to address the casualness of speech using the knowledge sources available. For example, sub-lexical linguistic information would be very useful in providing linguistic support for previously unseen words, and dynamic reliability modeling may help improve recognition robustness for poorly articulated speech. In this thesis, we mainly focused on the integration of knowledge sources within the speech understanding system of a conversational interface. More specifically, we studied the formalization and integration of hierarchical linguistic knowledge at both the sub-lexical level and the supra-lexical level, and proposed a unified framework for integrating hierarchical linguistic knowledge in speech recognition using layered finite-state transducers (FSTs).
 
(cont.) Within the proposed framework, we developed context-dependent hierarchical linguistic models at both sub-lexical and supra-lexical levels. FSTs were designed and constructed to encode both structure and probability constraints provided by the hierarchical linguistic models. We also studied empirically the feasibility and effectiveness of integrating hierarchical linguistic knowledge into speech recognition using the proposed framework. We found that, at the sub-lexical level, hierarchical linguistic modeling is effective in providing generic sub-word structure and probability constraints. Since such constraints are not restricted to a fixed system vocabulary, they can help the recognizer correctly identify previously unseen words. Together with the unknown word support from natural language understanding, a conversational interface would be able to deal with unknown words better, and can possibly incorporate them into the active recognition vocabulary on-the-fly. At the supra-lexical level, experimental results showed that the shallow parsing model built within the proposed layered FST framework with top-level n-gram probabilities and phrase-level context-dependent probabilities was able to reduce recognition errors, compared to a class n-gram model of the same order. However, we also found that its application can be limited by the complexity of the composed FSTs. This suggests that, with a much more complex grammar at the supra-lexical level, a proper tradeoff between tight knowledge integration and system complexity becomes more important ...
 
Description
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2002.
 
Includes bibliographical references (p. 171-178).
 
Date issued
2002
URI
http://hdl.handle.net/1721.1/29227
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.