On the Role of the Source Dataset in Transfer Learning

Khaddaj, Alaa

dc.contributor.advisor	Madry, Aleksander
dc.contributor.author	Khaddaj, Alaa
dc.date.accessioned	2023-01-19T18:42:23Z
dc.date.available	2023-01-19T18:42:23Z
dc.date.issued	2022-09
dc.date.submitted	2022-10-19T18:57:35.581Z
dc.identifier.uri	https://hdl.handle.net/1721.1/147278
dc.description.abstract	It is commonly believed that in transfer learning including more pre-training data translates into better performance. However, recent evidence suggests that removing data from the source dataset can actually help too. In this work, we take a closer look at the role of the source dataset's composition in transfer learning and present a framework for probing its impact on downstream performance. Our framework gives rise to new capabilities such as pinpointing transfer learning brittleness as well as detecting pathologies such as data-leakage and the presence of misleading examples in the source dataset. In particular, we demonstrate that removing detrimental datapoints identified by our framework improves transfer learning performance from ImageNet on a variety of target tasks.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	On the Role of the Source Dataset in Transfer Learning
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: Khaddaj-alaakh-SM-EECS-2022-th ...
Size:: 4.802Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record