Tree-based Data Replay for More Efficient LLM Continual Learning

Bailey, Brian

dc.contributor.advisor	Chase, Christina
dc.contributor.author	Bailey, Brian
dc.date.accessioned	2024-09-24T18:26:59Z
dc.date.available	2024-09-24T18:26:59Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-11T15:30:17.653Z
dc.identifier.uri	https://hdl.handle.net/1721.1/157017
dc.description.abstract	As Large Language Models (LLMs) gain popularity, they face a crucial challenge: effectively updating their knowledge bases with new data while retaining knowledge of prior information. This challenge is compounded by the considerable computational resources and time required to do so. This problem has been previously addressed using multiple approaches, including data replay, Elastic Weight Consolidation (EWC), and others. This study introduces an evolutionary tree-based data replay method designed to enhance the efficiency of LLMs’ continual training. It leverages the evolutionary relationships among domain-specific data to inform the replay strategy, selectively excluding similar data from the training of current subdomains to optimize efficiency. Initial experiments identified Mistral-7B as the appropriate model for this analysis. Subsequent tests assessed its performance under different data replay configurations, focusing on perplexity as the primary performance measure. The results indicate that focused data replay maintains model performance and enhance training efficiency. Models trained under restrictive replay conditions—excluding data from parent nodes—achieved perplexity scores within 1.5% of the baseline and reduced training time by up to 20%. Moreover, an ablation study established that a minimum replay ratio of 0.4:1 is essential to keep performance within 8.2% of the baseline. The findings suggest significant potential for structured data replay in improving continual learning processes for LLMs. Future research should explore data selection based on similarity metrics or automatic data categorization to enhance scalability and applicability.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Tree-based Data Replay for More Efficient LLM Continual Learning
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Brain and Cognitive Sciences
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Computation and Cognition

Files in this item

Name:: bailey-bbailey-meng-bcs-2024-t ...
Size:: 874.0Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record