dc.contributor.advisor | Stuart E. Madnick and John R. Williams. | en_US |
dc.contributor.author | Chuang, Shin Wee, 1978- | en_US |
dc.contributor.other | Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering. | en_US |
dc.date.accessioned | 2005-06-02T19:10:12Z | |
dc.date.available | 2005-06-02T19:10:12Z | |
dc.date.copyright | 2004 | en_US |
dc.date.issued | 2004 | en_US |
dc.identifier.uri | http://hdl.handle.net/1721.1/17915 | |
dc.description | Thesis (S.M.)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program; and, (S.M.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2004. | en_US |
dc.description | Includes bibliographical references (p. 76-77). | en_US |
dc.description.abstract | Web wrapping technologies were developed in the 90s in the middle of the dot com boom to facilitate the extraction of web data. In recent years, the underlying architecture of web wrapping technologies is also been used for other applications such as information integration between legacy systems in large enterprises. Despite the relatively widespread use of this technology, there is currently no uniform way of characterizing web wrapping toolkits, unlike say, a digital camera which can be described in terms of the size of its sensor or storage capacity. The focus of this thesis therefore is to develop a taxonomy or classification scheme that can be used to effectively describe a web wrapping toolkit in terms of its retrieval, extraction and conversion features. For this purpose, some 20 toolkits are studied and of which, verification tests were performed on 9 of these toolkits where evaluation copies are available. The last part of the thesis discusses two policy Acts that are closely related to data extraction. They are the EU Database Directive and the HR3261 Database and Collection of Information Misappropriation Act. A comparative analysis between the two Acts was performed and their respective implications on the database producing industry were examined. | en_US |
dc.description.statementofresponsibility | by Shin Wee Chuang. | en_US |
dc.format.extent | 77 p. | en_US |
dc.format.extent | 6449867 bytes | |
dc.format.extent | 6449671 bytes | |
dc.format.mimetype | application/pdf | |
dc.format.mimetype | application/pdf | |
dc.language.iso | eng | en_US |
dc.publisher | Massachusetts Institute of Technology | en_US |
dc.rights | M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. | en_US |
dc.rights.uri | http://dspace.mit.edu/handle/1721.1/7582 | |
dc.subject | Technology and Policy Program. | en_US |
dc.subject | Civil and Environmental Engineering. | en_US |
dc.title | A taxonomy and analysis of web wrapping technologies | en_US |
dc.type | Thesis | en_US |
dc.description.degree | S.M. | en_US |
dc.contributor.department | Massachusetts Institute of Technology. Department of Civil and Environmental Engineering | |
dc.contributor.department | Massachusetts Institute of Technology. Engineering Systems Division | |
dc.contributor.department | Technology and Policy Program | |
dc.identifier.oclc | 56728256 | en_US |