dc.contributor.advisor | Andreas, Jacob D. | |
dc.contributor.author | Meng, Kevin | |
dc.date.accessioned | 2024-09-16T13:49:41Z | |
dc.date.available | 2024-09-16T13:49:41Z | |
dc.date.issued | 2024-05 | |
dc.date.submitted | 2024-07-11T14:36:44.224Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/156794 | |
dc.description.abstract | This thesis investigates the mechanisms of factual recall in large language models. We first apply causal interventions to identify neuron activations that are decisive in a model’s factual predictions; surprisingly, we find that factual recall corresponds to a sparse, localizable computation in the MLP weights of the GPT models we study. Harnessing this insight, we then develop methods for efficiently and surgically inserting up to 10,000 new memories into a transformer; these methods perform well in terms of both generalization and specificity. We conclude with some directions for future work. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) | |
dc.rights | Copyright retained by author(s) | |
dc.rights.uri | https://creativecommons.org/licenses/by-nc-nd/4.0/ | |
dc.title | Interpreting and Editing Memory in Large Transformer Language Models | |
dc.type | Thesis | |
dc.description.degree | M.Eng. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Engineering in Electrical Engineering and Computer Science | |