MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

A taxonomy and analysis of web wrapping technologies

Author(s)
Chuang, Shin Wee, 1978-
Thumbnail
DownloadFull printable version (6.150Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering.
Advisor
Stuart E. Madnick and John R. Williams.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
Web wrapping technologies were developed in the 90s in the middle of the dot com boom to facilitate the extraction of web data. In recent years, the underlying architecture of web wrapping technologies is also been used for other applications such as information integration between legacy systems in large enterprises. Despite the relatively widespread use of this technology, there is currently no uniform way of characterizing web wrapping toolkits, unlike say, a digital camera which can be described in terms of the size of its sensor or storage capacity. The focus of this thesis therefore is to develop a taxonomy or classification scheme that can be used to effectively describe a web wrapping toolkit in terms of its retrieval, extraction and conversion features. For this purpose, some 20 toolkits are studied and of which, verification tests were performed on 9 of these toolkits where evaluation copies are available. The last part of the thesis discusses two policy Acts that are closely related to data extraction. They are the EU Database Directive and the HR3261 Database and Collection of Information Misappropriation Act. A comparative analysis between the two Acts was performed and their respective implications on the database producing industry were examined.
Description
Thesis (S.M.)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program; and, (S.M.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2004.
 
Includes bibliographical references (p. 76-77).
 
Date issued
2004
URI
http://hdl.handle.net/1721.1/17915
Department
Massachusetts Institute of Technology. Department of Civil and Environmental Engineering; Massachusetts Institute of Technology. Engineering Systems Division; Technology and Policy Program
Publisher
Massachusetts Institute of Technology
Keywords
Technology and Policy Program., Civil and Environmental Engineering.

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.