Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models
Author(s)
Figueroa, Reinaldo
DownloadThesis PDF (805.3Kb)
Advisor
Murray, Fiona
Terms of use
Metadata
Show full item recordAbstract
Language models are initially trained on large datasets, enabling them to extract patterns and establish rich contextual connections. When dealing with data scarcity, transfer learning has become the go-to method to use these models in specialized downstream tasks via fine-tuning. However, fine-tuning on small datasets can lead to overfitting and a lack of generalization. Generalization is crucial when deploying models that perform a sensitive tasks in a real world environment, as it dictates how well it performs on unseen data. Conversely, overfitting is highly likely to occur when training on small datasets. This thesis proposes and evaluates a new method for fine-tuning language models by adaptively choosing specific learning rates for each transformer layer that provide higher performance on in-domain low-volume datasets. Additionally, we explore which layers inside the models usually hold more contextual information from pre-training that might be valuable to keep ‘frozen’ when fine-tuning on small datasets. This analysis provides insights into fine-tuning approaches during initial experiments when data is limited. Our results demonstrate limited performance gains on certain models while achieving more significant gains on others when fine-tuning using our proposed method. Additionally, our work also provides valuable insight into per-layer importance of language models by showing that certain layers have a stronger direct correlation with the overall model accuracy.
Date issued
2024-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology