Show simple item record

dc.contributor.authorSnyder, Benjamin
dc.contributor.authorNaseem, Tahira
dc.contributor.authorBarzilay, Regina
dc.date.accessioned2010-10-14T12:48:54Z
dc.date.available2010-10-14T12:48:54Z
dc.date.issued2009-08
dc.date.submitted2009-08
dc.identifier.isbn978-1-932432-45-9
dc.identifier.urihttp://hdl.handle.net/1721.1/59314
dc.description.abstractWe investigate the task of unsupervised constituency parsing from bilingual parallel corpora. Our goal is to use bilingual cues to learn improved parsing models for each language and to evaluate these models on held-out monolingual test data. We formulate a generative Bayesian model which seeks to explain the observed parallel data through a combination of bilingual and monolingual parameters. To this end, we adapt a formalism known as unordered tree alignment to our probabilistic setting. Using this formalism, our model loosely binds parallel trees while allowing language-specific syntactic structure. We perform inference under this model using Markov Chain Monte Carlo and dynamic programming. Applying this model to three parallel corpora (Korean-English, Urdu-English, and Chinese-English) we find substantial performance gains over the CCM model, a strong monolingual baseline. On average, across a variety of testing scenarios, our model achieves an 8.8 absolute gain in F-measure.en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (grant IIS-0448168)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (grant IIS-0835445)en_US
dc.description.sponsorshipNational Science Foundation (U.S.) (grant IIS-0835652)en_US
dc.language.isoen_US
dc.publisherAssociation for Computational Linguisticsen_US
dc.relation.isversionofhttp://portal.acm.org/citation.cfm?id=1687890en_US
dc.rightsAttribution-Noncommercial-Share Alike 3.0 Unporteden_US
dc.rights.urihttp://creativecommons.org/licenses/by-nc-sa/3.0/en_US
dc.sourceMIT web domainen_US
dc.subjectalgorithmsen_US
dc.subjectdesignen_US
dc.subjectexperimentationen_US
dc.subjectlanguagesen_US
dc.subjectmeasurementen_US
dc.subjectperformanceen_US
dc.titleUnsupervised multilingual grammar inductionen_US
dc.typeArticleen_US
dc.identifier.citationSnyder, Benjamin, Tahira Naseem, and Regina Barzilay (2009). "Unsupervised multilingual grammar induction." Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP (Morristown, N.J.: Association for Computational Linguistics): 73-81. © 2009 Association for Computing Machinery.en_US
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratoryen_US
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Scienceen_US
dc.contributor.approverBarzilay, Regina
dc.contributor.mitauthorSnyder, Benjamin
dc.contributor.mitauthorNaseem, Tahira
dc.contributor.mitauthorBarzilay, Regina
dc.relation.journalProceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLPen_US
dc.eprint.versionAuthor's final manuscript
dc.type.urihttp://purl.org/eprint/type/ConferencePaperen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dspace.orderedauthorsSnyder, Benjamin; Naseem, Tahira; Barzilay, Regina
dc.identifier.orcidhttps://orcid.org/0000-0002-2921-8201
mit.licenseOPEN_ACCESS_POLICYen_US
mit.metadata.statusComplete


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record