Cause, Composition, and Structure in Language

Qian, Peng

dc.contributor.advisor	Levy, Roger P.
dc.contributor.author	Qian, Peng
dc.date.accessioned	2022-09-27T20:16:11Z
dc.date.available	2022-09-27T20:16:11Z
dc.date.issued	2022-05
dc.date.submitted	2022-09-27T16:57:59.169Z
dc.identifier.uri	https://hdl.handle.net/1721.1/145598
dc.description.abstract	From everyday communication to exploring new thoughts through writing, humans use language in a remarkably flexible, robust, and creative way. In this thesis, I present three case studies supporting the overarching hypothesis that linguistic knowledge in the human mind can be understood as hierarchically-structured causal generative models, within which a repertoire of compositional inference motifs support efficient inference. I begin with a targeted case study showing how native speakers follow principles of noisy-channel inference in resolving subject-verb agreement mismatches such as "The gift for the kids are hidden under the bed". Results suggest that native-speakers' inferences reflect both prior expectations and structure-sensitive conditioning of error probabilities consistent with the statistics of the language production environment. Second, I develop a more open-ended inferential challenge, completing fragmentary linguistic inputs such as "____ published won ____." into well-formed sentences. I use large-scale neural language models to compare two classes of models on this task: the task-specific fine-tuning approach standard in AI and NLP, versus an inferential approach involving composition of two simple computational motifs; the inferential approach yields more human-like completions. Third, I show that incorporating hierarchical linguistic structure into one of these computational motifs, namely the auto-regressive word prediction task, yields improvements in neural language model performance on targeted evaluations of models’ grammatical capabilities. I conclude by suggesting future directions in understanding the form and content of these causal generative models of human language.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Cause, Composition, and Structure in Language
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
dc.identifier.orcid	https://orcid.org/ 0000-0002-6916-3057
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: qian-pqian-phd-bcs-2022-thesis.pdf
Size:: 3.143Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record