A taxonomy and analysis of web wrapping technologies
Author(s)Chuang, Shin Wee, 1978-
Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering.
Stuart E. Madnick and John R. Williams.
MetadataShow full item record
Web wrapping technologies were developed in the 90s in the middle of the dot com boom to facilitate the extraction of web data. In recent years, the underlying architecture of web wrapping technologies is also been used for other applications such as information integration between legacy systems in large enterprises. Despite the relatively widespread use of this technology, there is currently no uniform way of characterizing web wrapping toolkits, unlike say, a digital camera which can be described in terms of the size of its sensor or storage capacity. The focus of this thesis therefore is to develop a taxonomy or classification scheme that can be used to effectively describe a web wrapping toolkit in terms of its retrieval, extraction and conversion features. For this purpose, some 20 toolkits are studied and of which, verification tests were performed on 9 of these toolkits where evaluation copies are available. The last part of the thesis discusses two policy Acts that are closely related to data extraction. They are the EU Database Directive and the HR3261 Database and Collection of Information Misappropriation Act. A comparative analysis between the two Acts was performed and their respective implications on the database producing industry were examined.
Thesis (S.M.)--Massachusetts Institute of Technology, Engineering Systems Division, Technology and Policy Program; and, (S.M.)--Massachusetts Institute of Technology, Dept. of Civil and Environmental Engineering, 2004.Includes bibliographical references (p. 76-77).
DepartmentMassachusetts Institute of Technology. Technology and Policy Program.; Massachusetts Institute of Technology. Dept. of Civil and Environmental Engineering.
Massachusetts Institute of Technology
Technology and Policy Program., Civil and Environmental Engineering.