Unsupervised multilingual grammar induction

Snyder, Benjamin; Naseem, Tahira; Barzilay, Regina

Author(s)

Snyder, Benjamin; Naseem, Tahira; Barzilay, Regina

DownloadBarzilay_Unsupervised multilingual.pdf (266.0Kb)

OPEN_ACCESS_POLICY

Terms of use

Attribution-Noncommercial-Share Alike 3.0 Unported http://creativecommons.org/licenses/by-nc-sa/3.0/

Metadata

Show full item record

Abstract

We investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters. To this end, we adapt a formalism known as unordered tree alignment to our probabilistic setting. Using this formalism, our model loosely binds parallel trees while allowing language-specific syntactic structure. We perform inference under this model using Markov Chain Monte Carlo and dynamic programming. Applying this model to three parallel corpora (Korean-English, Urdu-English, and Chinese-English) we find substantial performance gains over the CCM model, a strong monolingual baseline. On average, across a variety of testing scenarios, our model achieves an 8.8 absolute gain in F-measure.

Date issued

2009-08

URI

http://hdl.handle.net/1721.1/59314

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

Publisher

Association for Computational Linguistics

Citation

Snyder, Benjamin, Tahira Naseem, and Regina Barzilay (2009). "Unsupervised multilingual grammar induction." Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (Morristown, N.J.: Association for Computational Linguistics): 73-81. © 2009 Association for Computing Machinery.

Version: Author's final manuscript

ISBN

978-1-932432-45-9

Keywords

algorithms, design, experimentation, languages, measurement, performance

Collections

MIT Open Access Articles

DSpace@MIT