Using a symbolic language parser to Improve Markov language models
Author(s)
Townsend, Duncan Clarke McIntire
DownloadFull printable version (404.5Kb)
Other Contributors
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science.
Advisor
Boris Katz.
Terms of use
Metadata
Show full item recordAbstract
This thesis presents a hybrid approach to natural language processing that combines an n-gram (Markov) model with a symbolic parser. In concert these two techniques are applied to the problem of sentence simplification. The n-gram system is comprised of a relational database backend with a frontend application that presents a homogeneous interface for both direct n-gram lookup and Markov approximation. The query language exposed by the frontend also applies lexical information from the START natural language system to allow queries based on part of speech. Using the START natural language system's parser, English sentences are transformed into a collection of structural, syntactic, and lexical statements that are uniquely well-suited to the process of simplification. After reducing the parse of the sentence, the resulting expressions can be processed back into English. These reduced sentences are ranked by likelihood by the n-gram model.
Description
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 31-32).
Date issued
2015Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.