MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Investigating Model Editing for Unlearning in Large Language Models

Author(s)
Hossain, Shariqah
Thumbnail
DownloadThesis PDF (808.9Kb)
Advisor
Kagal, Lalana
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Data regulations on the Right to be Forgotten such as that in the General Data Protection Regulation (GDPR) of the European Union protect the right of users to remove private information from organizations. With the increasing usage and influence of large language models (LLMs) that are trained on personal data, a question of how to implement the removal of information within these models arises. In addition, large language models (LLMs) are trained on a large corpus of data that is usually scraped from the Web. A current challenge with ensuring reliable and safe outputs from LLMs is false, toxic, harmful or biased information from Web data that is captured in the knowledge of the model. Machine unlearning aims to remove unwanted information from a model, but many methods are inefficient for models with large numbers of parameters or fail to remove the entire scope of information without harming performance in the knowledge that is to be retained. Model editing algorithms solve a similar problem of changing information in LLMs, but they focus on redirecting inputs to a new target rather than removing that information altogether. Despite the parallels between model editing and unlearning, there has yet to be a thorough investigation of the potential of model editing approaches within this setting. In this work, we explore ROME, IKE, and WISE editing algorithms and design new editing targets for an unlearning setting. For evaluating the potential of the model editing algorithms, we focus on unlearning fictitious information using the Task of Fictitious Unlearning (TOFU) benchmark. Through this investigation, we show that model editing approaches can exceed the performance of current unlearning methods at removing information depending on the setting. They share the limitation of traditional unlearning of being unable to encapsulate the scope of what is to be unlearned without damage to overall model performance. We hope to leverage this information to improve methods for unlearning model knowledge and therefore improve the reliability of LLMs.
Date issued
2025-02
URI
https://hdl.handle.net/1721.1/159086
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.