MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

AnnoTool : crowdsourcing for natural language corpus creation

Author(s)
Hayden, Katherine (Katherine Marie)
Thumbnail
DownloadFull printable version (2.614Mb)
Alternative title
Anno Tool : crowdsourcing for natural language corpus creation
Crowdsourcing for natural language corpus creation
Other Contributors
Massachusetts Institute of Technology. Department of Architecture. Program in Media Arts and Sciences.
Advisor
Catherine Havasi.
Terms of use
M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582
Metadata
Show full item record
Abstract
This thesis explores the extent to which untrained annotators can create annotated corpora of scientific texts. Currently the variety and quantity of annotated corpora are limited by the expense of hiring or training annotators. The expense for finding and hiring professionals increases as the task becomes more esoteric or requiring of a specialized skill set. Training annotators is an investment in itself, often difficult to justify. Undergraduate students or volunteers may not remain with a project for long enough after being trained and graduate students' time may already be prioritized for other research goals. As the demand increases for computer programs capable of interacting with users through natural language, producing annotated datasets with which to train these programs is becoming increasingly important. This thesis presents an approach combining crowdsourcing with Luis von Ahn's "games with a purpose " paradigm. Crowdsourcing combines contributions from many participants in an online community. Games with a purpose incentivize voluntary contributions by providing an avenue for a task people are already incentivized to do, and collect data in the background. Here the desired data are annotations and the target community people annotating text for professional or personal benefit, such as scientists, researchers or the general public with an interest in science. An annotation tool was designed in the form of a Google Chrome extension specifically built to work with articles from the open-access, online scientific journal Public Library of Science (PLOS) ONE. A study was designed where participants with no prior annotator training were given a brief introduction to the annotation tool and assigned to annotate three articles. The results of the study demonstrate considerable annotator agreement. The results of this thesis demonstrate that crowdsourcing annotations is feasible even for technically sophisticated texts and presents a model of a platform that continuously gathers annotated corpora.
Description
Thesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, June 2014.
 
72
 
"September 2013." Cataloged from PDF version of thesis.
 
Includes bibliographical references (pages [51]-[54]).
 
Date issued
2014
URI
http://hdl.handle.net/1721.1/91819
Department
Program in Media Arts and Sciences (Massachusetts Institute of Technology)
Publisher
Massachusetts Institute of Technology
Keywords
Architecture. Program in Media Arts and Sciences.

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.