Show simple item record

dc.contributor.advisorAndreas, Jacob D.
dc.contributor.authorMeng, Kevin
dc.date.accessioned2024-09-16T13:49:41Z
dc.date.available2024-09-16T13:49:41Z
dc.date.issued2024-05
dc.date.submitted2024-07-11T14:36:44.224Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156794
dc.description.abstractThis thesis investigates the mechanisms of factual recall in large language models. We first apply causal interventions to identify neuron activations that are decisive in a model’s factual predictions; surprisingly, we find that factual recall corresponds to a sparse, localizable computation in the MLP weights of the GPT models we study. Harnessing this insight, we then develop methods for efficiently and surgically inserting up to 10,000 new memories into a transformer; these methods perform well in terms of both generalization and specificity. We conclude with some directions for future work.
dc.publisherMassachusetts Institute of Technology
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/
dc.titleInterpreting and Editing Memory in Large Transformer Language Models
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record