Incorporating Content Structure into Text Analysis Applications

Sauper, Christina; Haghighi, Aria; Barzilay, Regina

Author(s)

Sauper, Christina Joan; Haghighi, Aria; Barzilay, Regina

DownloadBarzilay_Incorporating content.pdf (556.4Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/

Metadata

Show full item record

Abstract

Information about the content structure of a document is largely ignored by current text analysis applications such as information extraction and sentiment analysis. This stands in contrast to the linguistic intuition that rich contextual information should benefit such applications. We present a framework which combines a supervised text analysis application with the induction of latent content structure. Both of these elements are learned jointly using the EM algorithm. The induced content structure is learned from a large unannotated corpus and biased by the underlying text analysis task. We demonstrate that exploiting content structure yields significant improvements over approaches that rely only on local context.

Description

URL to papers listed on conference site

Date issued

2010-10

URI

http://hdl.handle.net/1721.1/62235

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

EMNLP 2010 : Conference on Empirical Methods in Natural Language Processing

Publisher

Association for Computational Linguistics

Citation

Sauper, Christina, Aria Haghighi, and Regina Barzilay. "Incorporating Content Structure into Text Analysis Applications." EMNLP 2010: Conference on Empirical Methods in Natural Language Processing, October 9-11, 2010, MIT, Massachusetts, USA.

Version: Author's final manuscript

Collections

MIT Open Access Articles

DSpace@MIT