MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Graduate Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Tree-based Data Replay for More Efficient LLM Continual Learning

Author(s)
Bailey, Brian
Thumbnail
DownloadThesis PDF (874.0Kb)
Advisor
Chase, Christina
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
As Large Language Models (LLMs) gain popularity, they face a crucial challenge: effectively updating their knowledge bases with new data while retaining knowledge of prior information. This challenge is compounded by the considerable computational resources and time required to do so. This problem has been previously addressed using multiple approaches, including data replay, Elastic Weight Consolidation (EWC), and others. This study introduces an evolutionary tree-based data replay method designed to enhance the efficiency of LLMs’ continual training. It leverages the evolutionary relationships among domain-specific data to inform the replay strategy, selectively excluding similar data from the training of current subdomains to optimize efficiency. Initial experiments identified Mistral-7B as the appropriate model for this analysis. Subsequent tests assessed its performance under different data replay configurations, focusing on perplexity as the primary performance measure. The results indicate that focused data replay maintains model performance and enhance training efficiency. Models trained under restrictive replay conditions—excluding data from parent nodes—achieved perplexity scores within 1.5% of the baseline and reduced training time by up to 20%. Moreover, an ablation study established that a minimum replay ratio of 0.4:1 is essential to keep performance within 8.2% of the baseline. The findings suggest significant potential for structured data replay in improving continual learning processes for LLMs. Future research should explore data selection based on similarity metrics or automatic data categorization to enhance scalability and applicability.
Date issued
2024-05
URI
https://hdl.handle.net/1721.1/157017
Department
Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
Publisher
Massachusetts Institute of Technology

Collections
  • Graduate Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.