Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Luo, Jiaming; Hartmann, Frederik; Santus, Enrico; Barzilay, Regina; Cao, Yuan

dc.contributor.author	Luo, Jiaming
dc.contributor.author	Hartmann, Frederik
dc.contributor.author	Santus, Enrico
dc.contributor.author	Barzilay, Regina
dc.contributor.author	Cao, Yuan
dc.date.accessioned	2022-06-21T18:25:57Z
dc.date.available	2022-06-21T18:25:57Z
dc.date.issued	2021
dc.identifier.uri	https://hdl.handle.net/1721.1/143521
dc.description.abstract	<jats:p> Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship. <jats:sup>1</jats:sup> </jats:p>	en_US
dc.language.iso	en
dc.publisher	MIT Press - Journals	en_US
dc.relation.isversionof	10.1162/TACL_A_00354	en_US
dc.rights	Creative Commons Attribution 4.0 International license	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	MIT Press	en_US
dc.title	Deciphering Undersegmented Ancient Scripts Using Phonetic Prior	en_US
dc.type	Article	en_US
dc.identifier.citation	Luo, Jiaming, Hartmann, Frederik, Santus, Enrico, Barzilay, Regina and Cao, Yuan. 2021. "Deciphering Undersegmented Ancient Scripts Using Phonetic Prior." Transactions of the Association for Computational Linguistics, 9.
dc.contributor.department	Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journal	Transactions of the Association for Computational Linguistics	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2022-06-21T18:22:14Z
dspace.orderedauthors	Luo, J; Hartmann, F; Santus, E; Barzilay, R; Cao, Y	en_US
dspace.date.submission	2022-06-21T18:22:15Z
mit.journal.volume	9	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: tacl_a_00354.pdf
Size:: 745.6Kb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record