Investigating Model Editing for Unlearning in Large Language Models

Hossain, Shariqah

dc.contributor.advisor	Kagal, Lalana
dc.contributor.author	Hossain, Shariqah
dc.date.accessioned	2025-04-14T14:04:35Z
dc.date.available	2025-04-14T14:04:35Z
dc.date.issued	2025-02
dc.date.submitted	2025-04-03T14:06:17.299Z
dc.identifier.uri	https://hdl.handle.net/1721.1/159086
dc.description.abstract	Data regulations on the Right to be Forgotten such as that in the General Data Protection Regulation (GDPR) of the European Union protect the right of users to remove private information from organizations. With the increasing usage and influence of large language models (LLMs) that are trained on personal data, a question of how to implement the removal of information within these models arises. In addition, large language models (LLMs) are trained on a large corpus of data that is usually scraped from the Web. A current challenge with ensuring reliable and safe outputs from LLMs is false, toxic, harmful or biased information from Web data that is captured in the knowledge of the model. Machine unlearning aims to remove unwanted information from a model, but many methods are inefficient for models with large numbers of parameters or fail to remove the entire scope of information without harming performance in the knowledge that is to be retained. Model editing algorithms solve a similar problem of changing information in LLMs, but they focus on redirecting inputs to a new target rather than removing that information altogether. Despite the parallels between model editing and unlearning, there has yet to be a thorough investigation of the potential of model editing approaches within this setting. In this work, we explore ROME, IKE, and WISE editing algorithms and design new editing targets for an unlearning setting. For evaluating the potential of the model editing algorithms, we focus on unlearning fictitious information using the Task of Fictitious Unlearning (TOFU) benchmark. Through this investigation, we show that model editing approaches can exceed the performance of current unlearning methods at removing information depending on the setting. They share the limitation of traditional unlearning of being unable to encapsulate the scope of what is to be unlearned without damage to overall model performance. We hope to leverage this information to improve methods for unlearning model knowledge and therefore improve the reliability of LLMs.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Investigating Model Editing for Unlearning in Large Language Models
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: hossain-shossain-meng-eecs-202 ...
Size:: 808.9Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record