Using Principles from Cognitive Science to Analyze and Guide Language-Related Neural Networks

Tucker, Mycal

Author(s)

Tucker, Mycal

DownloadThesis PDF (5.937Mb)

Additional downloads

Supplementary file (228.1Kb)

Advisor

Shah, Julie A.

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Natural language, while central to human experience, is not uniquely the domain of humans. AI systems, typically neural networks, exhibit startling language processing capabilities from generating plausible text to modeling simplified language evolution. To what extent are such AI models learning language in a “human-like” way? Defining “human-like” generally may be an impossible problem, but narrower definitions of aspects of human-like language processing, borrowed from cognitive science literature, afford metrics for evaluating AI models. In this thesis, I borrow two theories about human language processing for such analysis. First, human naming systems (e.g., a language’s words for colors such as “red” or “blue”) appear near-optimal in an informationtheoretic sense of compressing meaning into a small number of words; I ask how one might train AI systems that behave similarly. Second, people understand and produce language according to hierarchical representations of structure; I study whether large language models use similar representations in predicting text. Thus, in this thesis, I show how to train and analyze neural networks according to cognitive theories of human language processing. In myfirst branch of work, I introduce a method for neural network agents to communicate according to cognitively-motivated pressures for utility, informativeness, and complexity. Utility represents a measure of task success and induces task-specific communication; informativeness is a task-agnostic measure of how well listeners understand speakers and leads to generalizable communication; complexity captures how many bits are allocated for communication and can lead to simpler communication systems. All three terms are important for human-like communication. In experiments, training artificial agents according to different tradeoffs between these properties led them to learn different naming systems that closely aligned with existing natural languages. In my second branch of work, rather than training neural agents from scratch, I probe pre-trained language models and found that some use representations of syntax in making predictions. Humans use hierarchical representations of sentence structure in understanding and producing language, but it is unclear if large language models, trained on simple tasks like next-word-prediction, should learn similar representations. I introduce a causal probing method that sheds light on this topic. By creating counterfactual representations 3of syntactically ambiguous sentences, I measured how model predictions changed for different structural interpretations of the same sentence. For example, I recorded model predictions to ambiguous inputs like “The girl saw the boy with the telescope. Who had the telescope?” with different syntactic structures. For some (but not all) models, I found that models use representations of syntax (e.g., change their answers to the previous question). Thus, I offer novel insight into pre-trained models and a new method for studying such models for other properties. Thetwohalvesofmythesisrepresentcomplementaryapproachestowardsmorehumanlike AI; training new models and analyzing pre-trained ones closes an AI development feedback loop. In this thesis, I explain my contributions to both parts of this loop and identify promising directions for future research.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/157821

Department

Massachusetts Institute of Technology. Department of Aeronautics and Astronautics

Publisher

Massachusetts Institute of Technology

Collections

Doctoral Theses