Show simple item record

dc.contributor.advisorDavid R. Karger.en_US
dc.contributor.authorHogue, Andrew William, 1978-en_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2005-09-26T20:16:29Z
dc.date.available2005-09-26T20:16:29Z
dc.date.copyright2004en_US
dc.date.issued2004en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/28406
dc.descriptionThesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.en_US
dc.descriptionIncludes bibliographical references (p. 103-106).en_US
dc.description.abstractWe develop a method for learning patterns from a set of positive examples to retrieve semantic content from tree-structured data. Specifically, we focus on HTML documents on the World Wide Web, which contain a wealth of semantic information and have a useful underlying tree structure. A user provides examples of relevant data they wish to extract from a web site through a simple user interface in a web browser. To construct patterns, we use the notion of the edit distance between the subtrees represented by these examples to distill them into a more general pattern. This pattern may then be used to retrieve other instances of the selected data from the same page or other similar pages. By linking patterns and their components with semantic labels using RDF, we can create semantic "overlays" for Web information which are useful in such projects as the Semantic Web and the Haystack information management environment.en_US
dc.description.statementofresponsibilityby Andrew William Hogue.en_US
dc.format.extent106 p.en_US
dc.format.extent5189206 bytes
dc.format.extent5201643 bytes
dc.format.mimetypeapplication/pdf
dc.format.mimetypeapplication/pdf
dc.language.isoen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleTree pattern inference and matching for wrapper induction on the World Wide Weben_US
dc.typeThesisen_US
dc.description.degreeM.Eng.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc56985415en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record