| dc.contributor.advisor | Andreas, Jacob | |
| dc.contributor.author | Hariharan, Kaivalya | |
| dc.date.accessioned | 2025-09-18T14:29:28Z | |
| dc.date.available | 2025-09-18T14:29:28Z | |
| dc.date.issued | 2025-05 | |
| dc.date.submitted | 2025-06-23T14:02:10.894Z | |
| dc.identifier.uri | https://hdl.handle.net/1721.1/162730 | |
| dc.description.abstract | Large language models (LLMs) generalize far beyond their training distribution, enabling impressive downstream performance in domains vastly different from their pretraining distribution. In this thesis, we develop a data-centric view on machine learning. We suggest that the deep generalization of LLMs is best understood through studying the relationships between the four fundamental components of this data generalization: pretraining data, test-time inputs, model outputs, and internal structure. Of these, we present two full research studies characterizing test-time inputs and internal structure. Chapter 1 develops the data-centric view of machine learning, and outline the thesis. Chapter 2 presents Breakpoint, a method of generating difficult coding tasks for models at a large scale that attempts to disambiguate the factors that make problems at test-time difficult. Chapter 3 analyzes the structure of gradient-based jailbreaks in LLMs. We argue that even though GBJs are more out of distribution than even random text, they induce a low-rank, structured change in models. Finally, Chapter 4 discusses the recent rise of reasoning models and proposing some lines of future work in the data-centric view towards developing more robust understanding of LLMs. | |
| dc.publisher | Massachusetts Institute of Technology | |
| dc.rights | In Copyright - Educational Use Permitted | |
| dc.rights | Copyright retained by author(s) | |
| dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
| dc.title | Towards transparent representations: on internal structure and external world modeling in LLMs | |
| dc.type | Thesis | |
| dc.description.degree | M.Eng. | |
| dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
| mit.thesis.degree | Master | |
| thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |