Unsupervised multilingual grammar induction

Snyder, Benjamin; Naseem, Tahira; Barzilay, Regina

dc.contributor.author	Snyder, Benjamin
dc.contributor.author	Naseem, Tahira
dc.contributor.author	Barzilay, Regina
dc.date.accessioned	2010-10-14T12:48:54Z
dc.date.available	2010-10-14T12:48:54Z
dc.date.issued	2009-08
dc.date.submitted	2009-08
dc.identifier.isbn	978-1-932432-45-9
dc.identifier.uri	http://hdl.handle.net/1721.1/59314
dc.description.abstract	We investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters. To this end, we adapt a formalism known as unordered tree alignment to our probabilistic setting. Using this formalism, our model loosely binds parallel trees while allowing language-specific syntactic structure. We perform inference under this model using Markov Chain Monte Carlo and dynamic programming. Applying this model to three parallel corpora (Korean-English, Urdu-English, and Chinese-English) we find substantial performance gains over the CCM model, a strong monolingual baseline. On average, across a variety of testing scenarios, our model achieves an 8.8 absolute gain in F-measure.	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (grant IIS-0448168)	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (grant IIS-0835445)	en_US
dc.description.sponsorship	National Science Foundation (U.S.) (grant IIS-0835652)	en_US
dc.language.iso	en_US
dc.publisher	Association for Computational Linguistics	en_US
dc.relation.isversionof	http://portal.acm.org/citation.cfm?id=1687890	en_US
dc.rights	Attribution-Noncommercial-Share Alike 3.0 Unported	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/3.0/	en_US
dc.source	MIT web domain	en_US
dc.subject	algorithms	en_US
dc.subject	design	en_US
dc.subject	experimentation	en_US
dc.subject	languages	en_US
dc.subject	measurement	en_US
dc.subject	performance	en_US
dc.title	Unsupervised multilingual grammar induction	en_US
dc.type	Article	en_US
dc.identifier.citation	Snyder, Benjamin, Tahira Naseem, and Regina Barzilay (2009). "Unsupervised multilingual grammar induction." Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (Morristown, N.J.: Association for Computational Linguistics): 73-81. © 2009 Association for Computing Machinery.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.contributor.approver	Barzilay, Regina
dc.contributor.mitauthor	Snyder, Benjamin
dc.contributor.mitauthor	Naseem, Tahira
dc.contributor.mitauthor	Barzilay, Regina
dc.relation.journal	Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP	en_US
dc.eprint.version	Author's final manuscript
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dspace.orderedauthors	Snyder, Benjamin; Naseem, Tahira; Barzilay, Regina
dc.identifier.orcid	https://orcid.org/0000-0002-2921-8201
mit.license	OPEN_ACCESS_POLICY	en_US
mit.metadata.status	Complete

Files in this item

Name:: Barzilay_Unsupervised multilin ...
Size:: 266.0Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record