Show simple item record

dc.contributor.advisorRegina Barzilay.en_US
dc.contributor.authorBranavan, Satchuthananthavale Rasiah Kuhanen_US
dc.contributor.otherMassachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science.en_US
dc.date.accessioned2009-01-30T16:37:56Z
dc.date.available2009-01-30T16:37:56Z
dc.date.copyright2008en_US
dc.date.issued2008en_US
dc.identifier.urihttp://hdl.handle.net/1721.1/44368
dc.descriptionThesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2008.en_US
dc.descriptionIncludes bibliographical references (p. 95-97).en_US
dc.description.abstractThis thesis focuses on methods for condensing large documents into highly concise summaries, achieving compression rates on par with human writers. While the need for such summaries in the current age of information overload is increasing, the desired compression rate has thus far been beyond the reach of automatic summarization systems. The potency of our summarization methods is due to their in-depth modelling of document content in a probabilistic framework. We explore two types of document representation that capture orthogonal aspects of text content. The first represents the semantic properties mentioned in a document in a hierarchical Bayesian model. This method is used to summarize thousands of consumer reviews by identifying the product properties mentioned by multiple reviewers. The second representation captures discourse properties, modelling the connections between different segments of a document. This discriminatively trained model is employed to generate tables of contents for books and lecture transcripts. The summarization methods presented here have been incorporated into large-scale practical systems that help users effectively access information online.en_US
dc.description.statementofresponsibilityby Satchuthananthavale Rasiah Kuhan Branavan.en_US
dc.format.extent97 p.en_US
dc.language.isoengen_US
dc.publisherMassachusetts Institute of Technologyen_US
dc.rightsM.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission.en_US
dc.rights.urihttp://dspace.mit.edu/handle/1721.1/7582en_US
dc.subjectElectrical Engineering and Computer Science.en_US
dc.titleHigh compression rate text summarizationen_US
dc.typeThesisen_US
dc.description.degreeS.M.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
dc.identifier.oclc276937779en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record