Scaling contrastive learning batch size by two orders of magnitude

Tian, Betsy

dc.contributor.advisor	Freeman, William
dc.contributor.author	Tian, Betsy
dc.date.accessioned	2025-10-06T17:36:35Z
dc.date.available	2025-10-06T17:36:35Z
dc.date.issued	2025-05
dc.date.submitted	2025-06-23T14:03:57.048Z
dc.identifier.uri	https://hdl.handle.net/1721.1/162953
dc.description.abstract	Contrastive learning has emerged as a powerful framework for unsupervised representation learning, allowing models to learn by maximizing agreement between related samples and distinguishing dissimilar ones. However, contrastive learning frameworks are fundamentally limited by the number of negative pairs a model can observe, and memory-intensive backbones constrain practical batch sizes. We introduce a three-phase, adapter-augmented training framework that scales contrastive batch sizes by two orders of magnitude – surpassing previous state-of-the-art learners in both accuracy and speed. First, we co-train the backbone and adapter on small batches to establish a strong initialization. Next, we freeze the backbone and train the adapter alone with very large batches, exposing it to an enlarged negative pool. Finally, we transfer large-batch adapter gradients back into the backbone via segmented backpropagation. We evaluate our method on the PlacesAudio dataset and show promising results for boosting retrieval performance at each phase. By exposing the model to substantially more negatives per effective batch, we achieve higher accuracy at a faster speed than optimizer-stepping baselines. Ultimately, this approach that scales batch size by hundreds of times can be integrated into any contrastive learning framework for more robust representation learning and abundant negative sampling.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Scaling contrastive learning batch size by two orders of magnitude
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: tian-betsy124-meng-eecs-2025-t ...
Size:: 655.0Kb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record