A Generative Model of Phonotactics

Futrell, Richard Landy Jones; Albright, Adam; Graff, Peter; O’Donnell, Timothy J.

dc.contributor.author	Futrell, Richard Landy Jones
dc.contributor.author	Albright, Adam
dc.contributor.author	Graff, Peter
dc.contributor.author	O’Donnell, Timothy J.
dc.date.accessioned	2020-11-18T22:54:02Z
dc.date.available	2020-11-18T22:54:02Z
dc.date.issued	2017-12
dc.identifier.issn	2307-387X
dc.identifier.uri	https://hdl.handle.net/1721.1/128532
dc.description.abstract	We present a probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language. Unlike most computational models of phonotactics (Hayes and Wilson, 2008; Goldsmith and Riggle, 2012), we take a fully generative approach, modeling a process where forms are built up out of subparts by phonologically-informed structure building operations. We learn an inventory of subparts by applying stochastic memoization (Johnson et al., 2007; Goodman et al., 2008) to a generative process for phonemes structured as an and-or graph, based on concepts of feature hierarchy from generative phonology (Clements, 1985; Dresher, 2009). Subparts are combined in a way that allows tier-based feature interactions. We evaluate our models’ ability to capture phonotactic distributions in the lexicons of 14 languages drawn from the WOLEX corpus (Graff, 2012). Our full model robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages. We also present novel analyses that probe model behavior in more detail.	en_US
dc.description.sponsorship	National Science Foundation (Grant 1551543)	en_US
dc.language.iso	en
dc.publisher	MIT Press	en_US
dc.relation.isversionof	http://dx.doi.org/10.1162/tacl_a_00047	en_US
dc.rights	Creative Commons Attribution 4.0 International license	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	MIT Press	en_US
dc.title	A Generative Model of Phonotactics	en_US
dc.type	Article	en_US
dc.identifier.citation	Futrell, Richard et al. "A Generative Model of Phonotactics." Transactions of the Association for Computational Linguistics 5 (December 2017): 73-86 © 2017 Association for Computational Linguistics	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Linguistics and Philosophy	en_US
dc.relation.journal	Transactions of the Association for Computational Linguistics	en_US
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2019-09-25T17:19:31Z
dspace.date.submission	2019-09-25T17:19:32Z
mit.journal.volume	5	en_US
mit.metadata.status	Complete

Files in this item

Name:: tacl_a_00047.pdf
Size:: 456.3Kb
Format:: PDF
Description:: Published version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record