Tree-based Data Replay for More Efficient LLM Continual Learning

Bailey, Brian

Author(s)

Bailey, Brian

DownloadThesis PDF (874.0Kb)

Advisor

Chase, Christina

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

As Large Language Models (LLMs) gain popularity, they face a crucial challenge: effectively updating their knowledge bases with new data while retaining knowledge of prior information. This challenge is compounded by the considerable computational resources and time required to do so. This problem has been previously addressed using multiple approaches, including data replay, Elastic Weight Consolidation (EWC), and others. This study introduces an evolutionary tree-based data replay method designed to enhance the efficiency of LLMs’ continual training. It leverages the evolutionary relationships among domain-specific data to inform the replay strategy, selectively excluding similar data from the training of current subdomains to optimize efficiency. Initial experiments identified Mistral-7B as the appropriate model for this analysis. Subsequent tests assessed its performance under different data replay configurations, focusing on perplexity as the primary performance measure. The results indicate that focused data replay maintains model performance and enhance training efficiency. Models trained under restrictive replay conditions—excluding data from parent nodes—achieved perplexity scores within 1.5% of the baseline and reduced training time by up to 20%. Moreover, an ablation study established that a minimum replay ratio of 0.4:1 is essential to keep performance within 8.2% of the baseline. The findings suggest significant potential for structured data replay in improving continual learning processes for LLMs. Future research should explore data selection based on similarity metrics or automatic data categorization to enhance scalability and applicability.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/157017

Department

Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses