dc.contributor.advisor | Katabi, Dina | |
dc.contributor.author | Fan, Lijie | |
dc.date.accessioned | 2024-08-21T18:56:04Z | |
dc.date.available | 2024-08-21T18:56:04Z | |
dc.date.issued | 2024-05 | |
dc.date.submitted | 2024-07-10T13:01:33.475Z | |
dc.identifier.uri | https://hdl.handle.net/1721.1/156315 | |
dc.description.abstract | Representation learning is crucial for developing robust vision systems. The effectiveness of this learning process largely depends on the quality and quantity of data. Synthetic data presents unique advantages in terms of flexibility, scalability, and controllability. Recent advances in generative modeling have enabled the synthesis of photorealistic images and high-quality text, drastically increasing the viability of synthetic data. Despite these advancements, the application of synthetic data for representation learning and visual recognition tasks lags behind, with a noticeable performance gap between models trained on synthetic versus real data. In this thesis we demonstrate our recent efforts to close this gap and utilize synthetic data to train state-of-the-art representation models. We begin by utilizing synthetic texts from large language models to enhance the training of vision-language models. Next, we explore synthetic images generated by text-to-image models, examining the scaling laws applicable to these images when used for supervised model training. We also introduce a multi-positive contrastive loss specifically designed for synthetic images, demonstrating their advantages over real images in representation learning. Finally, we propose a novel framework for training vision models exclusively with synthetic texts and images, which achieves superior performance, surpassing state-of-the-art models trained on real images in tasks including fine-grained classification and semantic segmentation. These works establish a robust foundation for advancing generative models in representation learning and solving key computer vision tasks, and mark an advance in utilizing synthetic data for improved representation learning across the data-centric AI ecosystem. | |
dc.publisher | Massachusetts Institute of Technology | |
dc.rights | In Copyright - Educational Use Permitted | |
dc.rights | Copyright retained by author(s) | |
dc.rights.uri | https://rightsstatements.org/page/InC-EDU/1.0/ | |
dc.title | Visual Representation Learning from Synthetic Data | |
dc.type | Thesis | |
dc.description.degree | Ph.D. | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
mit.thesis.degree | Doctoral | |
thesis.degree.name | Doctor of Philosophy | |