| dc.contributor.advisor | Andreas, Jacob | |
| dc.contributor.author | Akyurek, Ekin | |
| dc.date.accessioned | 2025-11-17T19:09:08Z | |
| dc.date.available | 2025-11-17T19:09:08Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-08-14T19:35:50.864Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/163719 | |
| dc.description.abstract | Modern language models (LMs) can perform complex tasks through in-context learning (ICL)—they can adapt to a task via examples provided in their input without any parameter updates. However, fundamental questions remain about when this adaptation works, what algorithms underlie it, and how to improve it. This thesis studies the mechanisms and limitations of ICL and develops better methods for test time adaptation of LMs on diverse benchmarks of language modeling and reasoning. I begin by evaluating the ICL capabilities of pre-trained LMs. I demonstrate that LMs can achieve strong compositional generalization when provided with few-shot examples. In a separate analysis, I show that their performance deteriorates significantly when faced with counterfactual variants of tasks they normally performed well on. Later, I develop "model problems" of ICL test the ability of LMs to learn novel mathematical structures in-context like linear functions and probabilistic formal languages. I interpret the algorithmic foundations of ICL. First, I prove that Transformer models with sufficient capacity can execute both iterative and closed-form solutions to linear regression problems, and demonstrate that these theoretical solutions manifest as interpretable intermediate variables. Then, I reveal how LMs develop specialized circuits that implement approximate n-gram learning algorithms for probabilistic languages. Building on these insights, I develop two approaches to enhance LMs. First, I demonstrate that explicitly incorporating n-gram computation into model architectures improves performance across multiple domains. Second, I introduce a test-time training that enables rapid adaptation through gradient updates on input data, achieving significant improvements over standard few-shot learning on abstract reasoning tasks. Together, these results advance our understanding of how LMs adapt to novel tasks and provide practical techniques for enhancing their test-time learning capabilities. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Inference-Time Learning Algorithms of Language Models | |
| dc.type | Thesis | |
| dc.description.degree | Ph.D. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Doctoral | |
| thesis.degree.name | Doctor of Philosophy | |