A Generative Model of Phonotactics

Futrell, Richard Landy Jones; Albright, Adam; Graff, Peter; O’Donnell, Timothy J.

Author(s)

Futrell, Richard Landy Jones; Albright, Adam; Graff, Peter; O’Donnell, Timothy J.

DownloadPublished version (456.3Kb)

Terms of use

Creative Commons Attribution 4.0 International license https://creativecommons.org/licenses/by/4.0/

Metadata

Show full item record

Abstract

We present a probabilistic model of phonotactics, the set of well-formed phoneme sequences in a language. Unlike most computational models of phonotactics (Hayes and Wilson, 2008; Goldsmith and Riggle, 2012), we take a fully generative approach, modeling a process where forms are built up out of subparts by phonologically-informed structure building operations. We learn an inventory of subparts by applying stochastic memoization (Johnson et al., 2007; Goodman et al., 2008) to a generative process for phonemes structured as an and-or graph, based on concepts of feature hierarchy from generative phonology (Clements, 1985; Dresher, 2009). Subparts are combined in a way that allows tier-based feature interactions. We evaluate our models’ ability to capture phonotactic distributions in the lexicons of 14 languages drawn from the WOLEX corpus (Graff, 2012). Our full model robustly assigns higher probabilities to held-out forms than a sophisticated N-gram model for all languages. We also present novel analyses that probe model behavior in more detail.

Date issued

2017-12

URI

https://hdl.handle.net/1721.1/128532

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences; Massachusetts Institute of Technology. Department of Linguistics and Philosophy

Journal

Transactions of the Association for Computational Linguistics

Publisher

MIT Press

Citation

Version: Final published version

ISSN

2307-387X

Collections

MIT Open Access Articles