Show simple item record

dc.contributor.advisorDavid L. Brock.en_US
dc.contributor.authorJacokes, M. Brian (Michael Brian)en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2009-06-30T17:32:00Z
dc.date.available2009-06-30T17:32:00Z
dc.date.copyright2008en_US
dc.date.issued2008en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/46158
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.en_US
dc.descriptionIncludes bibliographical references (p. 71-72).en_US
dc.description.abstractDespite the huge amount of computer data that exists today, the task of sharing information between organizations is still tackled largely on a case-by-case basis. The M Language is a data language that improves data sharing and interoperability by building a platform on top of XML and a semantic dictionary. Because the M Language is specifically designed for real-world data applications, it gives rise to several unique problems in natural language processing. I approach the problem of understanding unknown words by devising a novel heuristic for word decomposition called "probabilistic chunking," which achieves a 70% success rate in word syllabification and has potential applications in automatically decomposing words into morphemes. I also create algorithms which use probabilistic chunking to syllabify unknown words and thereby guess their parts of speech and semantic relations. This work contributes valuable methods to the areas of natural language processing and automatic data processing.en_US
dc.description.statementofresponsibilityby M. Brian Jacokes.en_US
dc.format.extent72 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleSyllables and the M Language : improving unknown word guessingen_US
dc.title.alternativeUnknown word guessing in a semantic data languageen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc399645095en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record