A Statistical Model for Lost Language Decipherment

Snyder, Benjamin; Barzilay, Regina; Knight, Kevin

Author(s)

Snyder, Benjamin; Barzilay, Regina; Knight, Kevin

DownloadBarzilay_A statistical.pdf (371.3Kb)

OPEN_ACCESS_POLICY

Terms of use

Creative Commons Attribution-Noncommercial-Share Alike 3.0 http://creativecommons.org/licenses/by-nc-sa/3.0/

Metadata

Show full item record

Abstract

In this paper we propose a method for the automatic decipherment of lost langauges. Given a non-parallel corpus in a known related language, our model produces both alphabetic mappings and translations of words into their corresponding cognates. We employ a non-parametric Bayesian framework to simultaneously capture both low-level character mappings and high-level morphemic correspondences. This formulation enables us to encode some of the linguistic intuitions that have guided human decipherers. When applied to the ancient Semitic language Ugaritic, the model correctly maps nearly all letters to their Hebrew counterparts, and deduces the correct Hebrew cognate for over half of the Ugaritic words which have cognates in Hebrew.

Description

URL to paper listed on conference site

Date issued

2010-07

URI

http://hdl.handle.net/1721.1/62802

Department

Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Journal

Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL 2010

Publisher

Association for Computational Linguistics

Citation

Snyder, Benjamin, Regina Barzilay and Kevin Knight. "A Statistical Model for Lost Language Decipherment." in ACL 2010, 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden, July 11–16, 2010.

Version: Author's final manuscript

Collections

MIT Open Access Articles

DSpace@MIT