Inference-Time Learning Algorithms of Language Models

Akyurek, Ekin

dc.contributor.advisor	Andreas, Jacob
dc.contributor.author	Akyurek, Ekin
dc.date.accessioned	2025-11-17T19:09:08Z
dc.date.available	2025-11-17T19:09:08Z
dc.date.issued	2025-05
dc.date.submitted	2025-08-14T19:35:50.864Z
dc.identifier.uri	https://hdl.handle.net/1721.1/163719
dc.description.abstract	Modern language models (LMs) can perform complex tasks through in-context learning (ICL)—they can adapt to a task via examples provided in their input without any parameter updates. However, fundamental questions remain about when this adaptation works, what algorithms underlie it, and how to improve it. This thesis studies the mechanisms and limitations of ICL and develops better methods for test time adaptation of LMs on diverse benchmarks of language modeling and reasoning. I begin by evaluating the ICL capabilities of pre-trained LMs. I demonstrate that LMs can achieve strong compositional generalization when provided with few-shot examples. In a separate analysis, I show that their performance deteriorates significantly when faced with counterfactual variants of tasks they normally performed well on. Later, I develop "model problems" of ICL test the ability of LMs to learn novel mathematical structures in-context like linear functions and probabilistic formal languages. I interpret the algorithmic foundations of ICL. First, I prove that Transformer models with sufficient capacity can execute both iterative and closed-form solutions to linear regression problems, and demonstrate that these theoretical solutions manifest as interpretable intermediate variables. Then, I reveal how LMs develop specialized circuits that implement approximate n-gram learning algorithms for probabilistic languages. Building on these insights, I develop two approaches to enhance LMs. First, I demonstrate that explicitly incorporating n-gram computation into model architectures improves performance across multiple domains. Second, I introduce a test-time training that enables rapid adaptation through gradient updates on input data, achieving significant improvements over standard few-shot learning on abstract reasoning tasks. Together, these results advance our understanding of how LMs adapt to novel tasks and provide practical techniques for enhancing their test-time learning capabilities.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Inference-Time Learning Algorithms of Language Models
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: akyurek-akyurek-phd-eecs-2025- ...
Size:: 11.92Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record