MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Author(s)
Luo, Jiaming; Hartmann, Frederik; Santus, Enrico; Barzilay, Regina; Cao, Yuan
Thumbnail
DownloadPublished version (745.6Kb)
Publisher with Creative Commons License

Publisher with Creative Commons License

Creative Commons Attribution

Terms of use
Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/
Metadata
Show full item record
Abstract
<jats:p> Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship. <jats:sup>1</jats:sup> </jats:p>
Date issued
2021
URI
https://hdl.handle.net/1721.1/143521
Department
Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Journal
Transactions of the Association for Computational Linguistics
Publisher
MIT Press - Journals
Citation
Luo, Jiaming, Hartmann, Frederik, Santus, Enrico, Barzilay, Regina and Cao, Yuan. 2021. "Deciphering Undersegmented Ancient Scripts Using Phonetic Prior." Transactions of the Association for Computational Linguistics, 9.
Version: Final published version

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.