MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Syllables and the M Language : improving unknown word guessing

Author(s)
Jacokes, M. Brian (Michael Brian)
Thumbnail
DownloadFull printable version (3.012Mb)
Alternative title
Unknown word guessing in a semantic data language
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
David L. Brock.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Despite the huge amount of computer data that exists today, the task of sharing information between organizations is still tackled largely on a case-by-case basis. The M Language is a data language that improves data sharing and interoperability by building a platform on top of XML and a semantic dictionary. Because the M Language is specifically designed for real-world data applications, it gives rise to several unique problems in natural language processing. I approach the problem of understanding unknown words by devising a novel heuristic for word decomposition called "probabilistic chunking," which achieves a 70% success rate in word syllabification and has potential applications in automatically decomposing words into morphemes. I also create algorithms which use probabilistic chunking to syllabify unknown words and thereby guess their parts of speech and semantic relations. This work contributes valuable methods to the areas of natural language processing and automatic data processing.
Description
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.
 
Includes bibliographical references (p. 71-72).
 
Date issued
2008
URI
http://hdl.handle.net/1721.1/46158
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.