Show simple item record

dc.contributor.advisorRegina Barzilay.en_US
dc.contributor.authorChen, Erdong, S.M. Massachusetts Institute of Technologyen_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2009-01-30T16:38:50Z
dc.date.available2009-01-30T16:38:50Z
dc.date.copyright2008en_US
dc.date.issued2008en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/44374
dc.descriptionThesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.en_US
dc.descriptionIncludes bibliographical references (p. 77-81).en_US
dc.description.abstractThis thesis focuses on computational discourse models for collaboratively edited corpora. Due to the exponential growth rate and significant stylistic and content variations of collaboratively edited corpora, models based on professionally edited texts are incapable of processing the new data effectively. For these methods to succeed, one challenge is to preserve the local coherence as well as global consistence. We explore two corpus-based methods for processing collaboratively edited corpora, which effectively model and optimize the consistence of user generated text. The first method addresses the task of inserting new information into existing texts. In particular, we wish to determine the best location in a text for a given piece of new information. We present an online ranking model which exploits this hierarchical structure - representationally in its features and algorithmically in its learning procedure. When tested on a corpus of Wikipedia articles, our hierarchically informed model predicts the correct insertion paragraph more accurately than baseline methods. The second method concerns inducing a common structure across multiple articles in similar domains to aid cross document collaborative editing. A graphical model is designed to induce section topics and to learn topic clusters. Some preliminary experiments showed that the proposed method is comparable to baseline methods.en_US
dc.description.statementofresponsibilityby Erdong Chen.en_US
dc.format.extent81 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleDiscourse models for collaboratively edited corporaen_US
dc.typeThesisen_US
dc.description.degreeS.M.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc276947510en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record