Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: A Bayesian Non-Parametric Approach

Snyder, Benjamin; Naseem, Tahira; Eisenstein, Jacob; Barzilay, Regina

Author(s)

Snyder, Benjamin; Naseem, Tahira; Eisenstein, Jacob; Barzilay, Regina

DownloadBarzilay_Adding more.pdf (544.2Kb)

OPEN_ACCESS_POLICY

Terms of use

Attribution-Noncommercial-Share Alike 3.0 Unported http://creativecommons.org/licenses/by-nc-sa/3.0/

Metadata

Show full item record

Abstract

We investigate the problem of unsupervised part-of-speech tagging when raw parallel data is available in a large number of languages. Patterns of ambiguity vary greatly across languages and therefore even unannotated multilingual data can serve as a learning signal. We propose a non-parametric Bayesian model that connects related tagging decisions across languages through the use of multilingual latent variables. Our experiments show that performance improves steadily as the number of languages increases.

Date issued

2009-06

URI

http://hdl.handle.net/1721.1/58926

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Publisher

Association for Computational Linguistics

Citation

Snyder, Benjamin. et al. "Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: A Bayesian Non-Parametric Approach." Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, pages 83–91, Boulder, Colorado, June 2009.

Version: Author's final manuscript

ISBN

978-1-932432-41-1

Collections

MIT Open Access Articles

DSpace@MIT