dc.contributor.advisor | David R. Karger. | en_US |
dc.contributor.author | Hogue, Andrew William, 1978- | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science. | en_US |
dc.date.accessioned | 2005-09-26T20:16:29Z | |
dc.date.available | 2005-09-26T20:16:29Z | |
dc.date.copyright | 2004 | en_US |
dc.date.issued | 2004 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/28406 | |
dc.description | Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004. | en_US |
dc.description | Includes bibliographical references (p. 103-106). | en_US |
dc.description.abstract | We develop a method for learning patterns from a set of positive examples to retrieve semantic content from tree-structured data. Specifically, we focus on HTML documents on the World Wide Web, which contain a wealth of semantic information and have a useful underlying tree structure. A user provides examples of relevant data they wish to extract from a web site through a simple user interface in a web browser. To construct patterns, we use the notion of the edit distance between the subtrees represented by these examples to distill them into a more general pattern. This pattern may then be used to retrieve other instances of the selected data from the same page or other similar pages. By linking patterns and their components with semantic labels using RDF, we can create semantic "overlays" for Web information which are useful in such projects as the Semantic Web and the Haystack information management environment. | en_US |
dc.description.statementofresponsibility | by Andrew William Hogue. | en_US |
dc.format.extent | 106 p. | en_US |
dc.format.extent | 5189206 bytes | |
dc.format.extent | 5201643 bytes | |
dc.format.mimetype | application/pdf | |
dc.format.mimetype | application/pdf | |
dc.language.iso | en_US | |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | |
dc.subject | Electrical Engineering and Computer Science. | en_US |
dc.title | Tree pattern inference and matching for wrapper induction on the World Wide Web | en_US |
dc.type | Thesis | en_US |
dc.description.degree | M.Eng. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.identifier.oclc | 56985415 | en_US |