Show simple item record

dc.contributor.advisorCatherine Havasi.en_US
dc.contributor.authorHayden, Katherine (Katherine Marie)en_US
dc.contributor.otherMassachusetts Institute of Technology. Department of Architecture. Program in Media Arts and Sciences.en_US
dc.date.accessioned2014-11-24T18:37:07Z
dc.date.available2014-11-24T18:37:07Z
dc.date.copyright2013en_US
dc.date.issued2014en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/91819
dc.descriptionThesis: S.M., Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, June 2014.en_US
dc.description72en_US
dc.description"September 2013." Cataloged from PDF version of thesis.en_US
dc.descriptionIncludes bibliographical references (pages [51]-[54]).en_US
dc.description.abstractThis thesis explores the extent to which untrained annotators can create annotated corpora of scientific texts. Currently the variety and quantity of annotated corpora are limited by the expense of hiring or training annotators. The expense for finding and hiring professionals increases as the task becomes more esoteric or requiring of a specialized skill set. Training annotators is an investment in itself, often difficult to justify. Undergraduate students or volunteers may not remain with a project for long enough after being trained and graduate students' time may already be prioritized for other research goals. As the demand increases for computer programs capable of interacting with users through natural language, producing annotated datasets with which to train these programs is becoming increasingly important. This thesis presents an approach combining crowdsourcing with Luis von Ahn's "games with a purpose " paradigm. Crowdsourcing combines contributions from many participants in an online community. Games with a purpose incentivize voluntary contributions by providing an avenue for a task people are already incentivized to do, and collect data in the background. Here the desired data are annotations and the target community people annotating text for professional or personal benefit, such as scientists, researchers or the general public with an interest in science. An annotation tool was designed in the form of a Google Chrome extension specifically built to work with articles from the open-access, online scientific journal Public Library of Science (PLOS) ONE. A study was designed where participants with no prior annotator training were given a brief introduction to the annotation tool and assigned to annotate three articles. The results of the study demonstrate considerable annotator agreement. The results of this thesis demonstrate that crowdsourcing annotations is feasible even for technically sophisticated texts and presents a model of a platform that continuously gathers annotated corpora.en_US
dc.description.statementofresponsibilityby Katherine Hayden.en_US
dc.format.extent49, 5 pagesen_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectArchitecture. Program in Media Arts and Sciences.en_US
dc.titleAnnoTool : crowdsourcing for natural language corpus creationen_US
dc.title.alternativeAnno Tool : crowdsourcing for natural language corpus creationen_US
dc.title.alternativeCrowdsourcing for natural language corpus creationen_US
dc.typeThesisen_US
dc.description.degreeS.M.en_US
dc.contributor.departmentProgram in Media Arts and Sciences (Massachusetts Institute of Technology)
dc.identifier.oclc894221970en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record