Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: A Bayesian Non-Parametric Approach
Author(s)
Snyder, Benjamin; Naseem, Tahira; Eisenstein, Jacob; Barzilay, Regina
DownloadBarzilay_Adding more.pdf (544.2Kb)
OPEN_ACCESS_POLICY
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
We investigate the problem of unsupervised part-of-speech tagging when raw parallel data is available in a large number of languages. Patterns of ambiguity vary greatly across languages and therefore even unannotated multilingual data can serve as a learning signal. We propose a non-parametric Bayesian model that connects related tagging decisions across languages through the use of multilingual latent variables. Our experiments show that performance improves steadily as the number of languages increases.
Date issued
2009-06Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Publisher
Association for Computational Linguistics
Citation
Snyder, Benjamin. et al. "Adding More Languages Improves Unsupervised Multilingual Part-of-Speech Tagging: A Bayesian Non-Parametric Approach." Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the ACL, pages 83–91,
Boulder, Colorado, June 2009.
Version: Author's final manuscript
ISBN
978-1-932432-41-1