dc.contributor.advisor | Madry, Aleksander | |
dc.contributor.author | Khaddaj, Alaa | |
dc.date.accessioned | 2023-01-19T18:42:23Z | |
dc.date.available | 2023-01-19T18:42:23Z | |
dc.date.issued | 2022-09 | |
dc.date.submitted | 2022-10-19T18:57:35.581Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/147278 | |
dc.description.abstract | It is commonly believed that in transfer learning including more pre-training data translates into better performance. However, recent evidence suggests that removing data from the source dataset can actually help too. In this work, we take a closer look at the role of the source dataset's composition in transfer learning and present a framework for probing its impact on downstream performance. Our framework gives rise to new capabilities such as pinpointing transfer learning brittleness as well as detecting pathologies such as data-leakage and the presence of misleading examples in the source dataset. In particular, we demonstrate that removing detrimental datapoints identified by our framework improves transfer learning performance from ImageNet on a variety of target tasks. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | In Copyright - Educational Use Permitted | |
dc.rights | Copyright MIT | |
dc.rights.uri | http://rightsstatements.org/page/InC-EDU/1.0/ | |
dc.title | On the Role of the Source Dataset in Transfer Learning | |
dc.type | Thesis | |
dc.description.degree | S.M. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Master | |
thesis.degree.name | Master of Science in Electrical Engineering and Computer Science | |