Show simple item record

dc.contributor.advisorDeb K. Roy.en_US
dc.contributor.authorOney, Stephen Wen_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2009-06-30T16:59:39Z
dc.date.available2009-06-30T16:59:39Z
dc.date.copyright2008en_US
dc.date.issued2008en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/46009
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.en_US
dc.descriptionIncludes bibliographical references (leaves 45-47).en_US
dc.description.abstractThis thesis focuses on techniques with which natural language can be used to search for specific elements in a structured document, such as an XML file. The goal is to create a system capable of being trained to identify features, of written English sentence describing (in natural language) part of an XML document, that help identify the sections of said document which were discussed. In particular, this thesis will revolve around the problem of searching through XML documents, each of which describes the play-by-play events of a baseball game. These events are collected from Major League Baseball games between 2004 and 2008, containing information detailing the outcome of every pitch thrown. My techniques are trained and tested on written (newspaper) summaries of these games, which often refer to specific game events and statistics. The choice of these training data makes the task much more complex in two ways. First, these summaries come from multiple authors. Each of these authors has a distinct writing style, which uses language in a unique and often complex way. Secondly, large portions of these summaries discuss facts outside of the context of the play-by-play events of the XML documents. Training the system with these portions of the summary can create a problem due to sparse data, which has the potential to reduce the effectiveness of the system. The end result is the creation of a system capable of building classifiers for natural language search of these XML documents.en_US
dc.description.abstract(cont.) This system is able to overcome the two aforementioned problems, as well as several more subtle challenges. In addition, several limitations of alternative, strictly feature-based, classifiers are also illustrated, and applications of this research to related problems (outside of baseball and sports) are discussed.en_US
dc.description.statementofresponsibilityby Stephen W. Oney.en_US
dc.format.extent48 leavesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleNatural language search of structured documentsen_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc355696468en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record