|
Title:
|
Global models of document structure using latent permutations |
|
Author:
|
Chen, Harr; Branavan, S. R. K.; Barzilay, Regina; Karger, David R. |
|
Department:
|
Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science; Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory |
|
Publisher:
|
Association for Computational Linguistics |
|
Issue Date:
|
2009-06 |
|
Abstract:
|
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be elegantly represented using a distribution over permutations called the generalized Mallows model. Our structure-aware approach substantially outperforms alternative approaches for cross-document comparison and single-document segmentation. |
|
URI:
|
http://hdl.handle.net/1721.1/59312
|
|
ISBN:
|
978-1-932432-41-1 |
|
Citation:
|
Chen, Harr, S.R.K. Branavan, Regina Barzilay, and David R. Karger (2009). "Global models of document structure using latent permutations." Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (Morristown, N.J.: Association for Computational Linguistics): 371-379. © 2009 Association for Computing Machinery. |
|
Version:
|
Final published version |
|
Terms of Use:
|
Attribution-Noncommercial-Share Alike 3.0 Unported |
|
Detailed Terms:
|
http://creativecommons.org/licenses/by-nc-sa/3.0/
|
|
Journal:
|
Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics |