On the Acquisition of Formal Semantics in Statistical Models of Language
Author(s)
Jin, Charles C.
DownloadThesis PDF (9.474Mb)
Advisor
Rinard, Martin C.
Terms of use
Metadata
Show full item recordAbstract
The increasingly impressive performance of recent large language models raises a crucial question: to what extent can such models, trained solely on text, develop an understanding of language grounded in the semantics of the underlying domain? Progress on this question carries significant practical and philosophical implications for the relationship between meaning, understanding, and the capacity to exhibit seemingly intelligent behavior.
This thesis makes two primary contributions. First, it develops a scientifically rigorous approach to studying what statistical models of language can understand about language based on the formal semantics of programming languages. Specifically, it leverages the probing classifiers framework: training small classifiers to find encodings of program semantics within the model's internal representations. A main insight is that the clean separation between syntax and semantics in this domain allows for greater control in experimental design. It introduces two new techniques. The first, semantic probing interventions, is a general methodology for distinguishing whether the probe's measurements reflect (1) the learned representations of the language model encode semantics or (2) that the probe itself has learned to infer semantics from representations of pure syntax. The second, latent causal probing, is a formal framework for probing that provides a robust empirical methodology for studying whether language models are able to access the latent concepts that underlie the text they observe during training. A key innovation is to create a single structural causal model that unifies (1) the data generation process underlying the text used to train the language model and (2) the steps of a probing experiment. This makes it possible to conduct a causal analysis that intervenes on the data generation process to trace the influence of the latent variables in the training data through the model's internal representations.
The second core contribution of this thesis consists of a series of experimental studies. Specifically, we train a language model on a synthetic grid-world navigation task, then probe the model's learned representations for encodings of the unobserved, intermediate world states. By leveraging the techniques we develop, the results deliver strong empirical evidence that statistical models of language are latent concept learners: capable of inducing the latent variables that underlie the generation of their training data, despite being trained only to model a conditional distribution over tokens.
Date issued
2024-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology