Show simple item record

dc.contributor.advisorKatabi, Dina
dc.contributor.authorFan, Lijie
dc.date.accessioned2024-08-21T18:56:04Z
dc.date.available2024-08-21T18:56:04Z
dc.date.issued2024-05
dc.date.submitted2024-07-10T13:01:33.475Z
dc.identifier.urihttps://hdl.handle.net/1721.1/156315
dc.description.abstractRepresentation learning is crucial for developing robust vision systems. The effectiveness of this learning process largely depends on the quality and quantity of data. Synthetic data presents unique advantages in terms of flexibility, scalability, and controllability. Recent advances in generative modeling have enabled the synthesis of photorealistic images and high-quality text, drastically increasing the viability of synthetic data. Despite these advancements, the application of synthetic data for representation learning and visual recognition tasks lags behind, with a noticeable performance gap between models trained on synthetic versus real data. In this thesis we demonstrate our recent efforts to close this gap and utilize synthetic data to train state-of-the-art representation models. We begin by utilizing synthetic texts from large language models to enhance the training of vision-language models. Next, we explore synthetic images generated by text-to-image models, examining the scaling laws applicable to these images when used for supervised model training. We also introduce a multi-positive contrastive loss specifically designed for synthetic images, demonstrating their advantages over real images in representation learning. Finally, we propose a novel framework for training vision models exclusively with synthetic texts and images, which achieves superior performance, surpassing state-of-the-art models trained on real images in tasks including fine-grained classification and semantic segmentation. These works establish a robust foundation for advancing generative models in representation learning and solving key computer vision tasks, and mark an advance in utilizing synthetic data for improved representation learning across the data-centric AI ecosystem.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleVisual Representation Learning from Synthetic Data
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record