Content modeling for social media text

Sauper, Christina (Christina Joan)

dc.contributor.advisor	Regina Barzilay.	en_US
dc.contributor.author	Sauper, Christina (Christina Joan)	en_US
dc.contributor.other	Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.	en_US
dc.date.accessioned	2012-12-13T18:49:04Z
dc.date.available	2012-12-13T18:49:04Z
dc.date.copyright	2012	en_US
dc.date.issued	2012	en_US
dc.identifier.uri	http://hdl.handle.net/1721.1/75648
dc.description	Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.	en_US
dc.description	Cataloged from PDF version of thesis.	en_US
dc.description	Includes bibliographical references (p. 129-136).	en_US
dc.description.abstract	This thesis focuses on machine learning methods for extracting information from user-generated content. Instances of this data such as product and restaurant reviews have become increasingly valuable and influential in daily decision making. In this work, I consider a range of extraction tasks such as sentiment analysis and aspect-based review aggregation. These tasks have been well studied in the context of newswire documents, but the informal and colloquial nature of social media poses significant new challenges. The key idea behind our approach is to automatically induce the content structure of individual documents given a large, noisy collection of user-generated content. This structure enables us to model the connection between individual documents and effectively aggregate their content. The models I propose demonstrate that content structure can be utilized at both document and phrase level to aid in standard text analysis tasks. At the document level, I capture this idea by joining the original task features with global contextual information. The coupling of the content model and the task-specific model allows the two components to mutually influence each other during learning. At the phrase level, I utilize a generative Bayesian topic model where a set of properties and corresponding attribute tendencies are represented as hidden variables. The model explains how the observed text arises from the latent variables, thereby connecting text fragments with corresponding properties and attributes.	en_US
dc.description.statementofresponsibility	by Christina Sauper.	en_US
dc.format.extent	136 p.	en_US
dc.language.iso	eng	en_US
dc.publisher	Massachusetts Institute of Technology	en_US
dc.rights	M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.	en_US
dc.rights.uri	http://dspace.mit.edu/handle/1721.1/7582	en_US
dc.subject	Electrical Engineering and Computer Science.	en_US
dc.title	Content modeling for social media text	en_US
dc.type	Thesis	en_US
dc.description.degree	Ph.D.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc	818328678	en_US

Files in this item

Name:: 818328678-MIT.pdf
Size:: 17.27Mb
Format:: PDF
Description:: Full printable version

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record