Visual Representation Learning from Synthetic Data

Fan, Lijie

dc.contributor.advisor	Katabi, Dina
dc.contributor.author	Fan, Lijie
dc.date.accessioned	2024-08-21T18:56:04Z
dc.date.available	2024-08-21T18:56:04Z
dc.date.issued	2024-05
dc.date.submitted	2024-07-10T13:01:33.475Z
dc.identifier.uri	https://hdl.handle.net/1721.1/156315
dc.description.abstract	Representation learning is crucial for developing robust vision systems. The effectiveness of this learning process largely depends on the quality and quantity of data. Synthetic data presents unique advantages in terms of flexibility, scalability, and controllability. Recent advances in generative modeling have enabled the synthesis of photorealistic images and high-quality text, drastically increasing the viability of synthetic data. Despite these advancements, the application of synthetic data for representation learning and visual recognition tasks lags behind, with a noticeable performance gap between models trained on synthetic versus real data. In this thesis we demonstrate our recent efforts to close this gap and utilize synthetic data to train state-of-the-art representation models. We begin by utilizing synthetic texts from large language models to enhance the training of vision-language models. Next, we explore synthetic images generated by text-to-image models, examining the scaling laws applicable to these images when used for supervised model training. We also introduce a multi-positive contrastive loss specifically designed for synthetic images, demonstrating their advantages over real images in representation learning. Finally, we propose a novel framework for training vision models exclusively with synthetic texts and images, which achieves superior performance, surpassing state-of-the-art models trained on real images in tasks including fine-grained classification and semantic segmentation. These works establish a robust foundation for advancing generative models in representation learning and solving key computer vision tasks, and mark an advance in utilizing synthetic data for improved representation learning across the data-centric AI ecosystem.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Visual Representation Learning from Synthetic Data
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: fan-lijiefan-phd-eecs-2024-the ...
Size:: 14.60Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record