Show simple item record

dc.contributor.authorLuo, Jiaming
dc.contributor.authorHartmann, Frederik
dc.contributor.authorSantus, Enrico
dc.contributor.authorBarzilay, Regina
dc.contributor.authorCao, Yuan
dc.date.accessioned2022-06-21T18:25:57Z
dc.date.available2022-06-21T18:25:57Z
dc.date.issued2021
dc.identifier.urihttps://hdl.handle.net/1721.1/143521
dc.description.abstract<jats:p> Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship. <jats:sup>1</jats:sup> </jats:p>en_US
dc.language.isoen
dc.publisherMIT Press - Journalsen_US
dc.relation.isversionof10.1162/TACL_A_00354en_US
dc.rightsCreative Commons Attribution 4.0 International licenseen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceMIT Pressen_US
dc.titleDeciphering Undersegmented Ancient Scripts Using Phonetic Prioren_US
dc.typeArticleen_US
dc.identifier.citationLuo, Jiaming, Hartmann, Frederik, Santus, Enrico, Barzilay, Regina and Cao, Yuan. 2021. "Deciphering Undersegmented Ancient Scripts Using Phonetic Prior." Transactions of the Association for Computational Linguistics, 9.
dc.contributor.departmentMassachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
dc.relation.journalTransactions of the Association for Computational Linguisticsen_US
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2022-06-21T18:22:14Z
dspace.orderedauthorsLuo, J; Hartmann, F; Santus, E; Barzilay, R; Cao, Yen_US
dspace.date.submission2022-06-21T18:22:15Z
mit.journal.volume9en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record