Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models

Figueroa, Reinaldo

dc.contributor.advisor	Murray, Fiona
dc.contributor.author	Figueroa, Reinaldo
dc.date.accessioned	2024-10-09T18:25:53Z
dc.date.available	2024-10-09T18:25:53Z
dc.date.issued	2024-09
dc.date.submitted	2024-10-07T14:34:33.899Z
dc.identifier.uri	https://hdl.handle.net/1721.1/157169
dc.description.abstract	Language models are initially trained on large datasets, enabling them to extract patterns and establish rich contextual connections. When dealing with data scarcity, transfer learning has become the go-to method to use these models in specialized downstream tasks via fine-tuning. However, fine-tuning on small datasets can lead to overfitting and a lack of generalization. Generalization is crucial when deploying models that perform a sensitive tasks in a real world environment, as it dictates how well it performs on unseen data. Conversely, overfitting is highly likely to occur when training on small datasets. This thesis proposes and evaluates a new method for fine-tuning language models by adaptively choosing specific learning rates for each transformer layer that provide higher performance on in-domain low-volume datasets. Additionally, we explore which layers inside the models usually hold more contextual information from pre-training that might be valuable to keep ‘frozen’ when fine-tuning on small datasets. This analysis provides insights into fine-tuning approaches during initial experiments when data is limited. Our results demonstrate limited performance gains on certain models while achieving more significant gains on others when fine-tuning using our proposed method. Additionally, our work also provides valuable insight into per-layer importance of language models by showing that certain layers have a stronger direct correlation with the overall model accuracy.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Evaluating Adaptive Layer Freezing through Hyperparameter Optimization for Enhanced Fine-Tuning Performance of Language Models
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: figueroa-reyfp-meng-eecs-2024- ...
Size:: 805.3Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record