Show simple item record

dc.contributor.advisorKagal, Lalana
dc.contributor.authorHossain, Shariqah
dc.date.accessioned2025-04-14T14:04:35Z
dc.date.available2025-04-14T14:04:35Z
dc.date.issued2025-02
dc.date.submitted2025-04-03T14:06:17.299Z
dc.identifier.urihttps://hdl.handle.net/1721.1/159086
dc.description.abstractData regulations on the Right to be Forgotten such as that in the General Data Protection Regulation (GDPR) of the European Union protect the right of users to remove private information from organizations. With the increasing usage and influence of large language models (LLMs) that are trained on personal data, a question of how to implement the removal of information within these models arises. In addition, large language models (LLMs) are trained on a large corpus of data that is usually scraped from the Web. A current challenge with ensuring reliable and safe outputs from LLMs is false, toxic, harmful or biased information from Web data that is captured in the knowledge of the model. Machine unlearning aims to remove unwanted information from a model, but many methods are inefficient for models with large numbers of parameters or fail to remove the entire scope of information without harming performance in the knowledge that is to be retained. Model editing algorithms solve a similar problem of changing information in LLMs, but they focus on redirecting inputs to a new target rather than removing that information altogether. Despite the parallels between model editing and unlearning, there has yet to be a thorough investigation of the potential of model editing approaches within this setting. In this work, we explore ROME, IKE, and WISE editing algorithms and design new editing targets for an unlearning setting. For evaluating the potential of the model editing algorithms, we focus on unlearning fictitious information using the Task of Fictitious Unlearning (TOFU) benchmark. Through this investigation, we show that model editing approaches can exceed the performance of current unlearning methods at removing information depending on the setting. They share the limitation of traditional unlearning of being unable to encapsulate the scope of what is to be unlearned without damage to overall model performance. We hope to leverage this information to improve methods for unlearning model knowledge and therefore improve the reliability of LLMs.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleInvestigating Model Editing for Unlearning in Large Language Models
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record