MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Automated creation of Wikipedia articles

Author(s)
Sauper, Christina (Christina Joan)
Thumbnail
DownloadFull printable version (8.386Mb)
Other Contributors
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.
Advisor
Regina Barzilay.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
This thesis describes an automatic approach for producing Wikipedia articles. The wealth of information present on the Internet is currently untapped for many topics of secondary concern. Creating articles requires a great deal of time spent collecting information and editing. This thesis presents a solution. The proposed algorithm creates a new article by querying the Internet, selecting relevant excerpts from the search results, and synthesizing the best excerpts into a coherent document. This work builds on previous work in document summarization, web question answering, and Integer Linear Programming. At the core of our approach is a method for using existing human-authored Wikipedia articles to learn a content selection mechanism. Articles in the same category often present similar types of information; we can leverage this to create content templates for new articles. Once a template has been created, we use classification and clustering techniques to select a single best excerpt for each section. Finally, we use Integer Linear Programming techniques to eliminate any redundancy over the complete article. We evaluate our system for both individual sections and complete articles, using both human and automatic evaluation methods. The results indicate that articles created by our system are close to human-authored Wikipedia entries in quality of content selection. We show that both human and automatic evaluation metrics are in agreement; therefore, automatic methods are a reasonable evaluation tool for this task. We also empirically demonstrate that explicit modeling of content structure is essential for improving the quality of an automatically-produced article.
Description
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2009.
 
Includes bibliographical references (leaves 81-84).
 
Date issued
2009
URI
http://hdl.handle.net/1721.1/47824
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology
Keywords
Electrical Engineering and Computer Science.

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.