Scaling contrastive learning batch size by two orders ofmagnitude
Author(s)
Tian, Betsy
DownloadThesis PDF (655.0Kb)
Advisor
Freeman, William
Terms of use
Metadata
Show full item recordAbstract
Contrastive learning has emerged as a powerful framework for unsupervised representation learning, allowing models to learn by maximizing agreement between related samples and distinguishing dissimilar ones. However, contrastive learning frameworks are fundamentally limited by the number of negative pairs a model can observe, and memory-intensive backbones constrain practical batch sizes. We introduce a three-phase, adapter-augmented training framework that scales contrastive batch sizes by two orders of magnitude – surpassing previous state-of-the-art learners in both accuracy and speed. First, we co-train the backbone and adapter on small batches to establish a strong initialization. Next, we freeze the backbone and train the adapter alone with very large batches, exposing it to an enlarged negative pool. Finally, we transfer large-batch adapter gradients back into the backbone via segmented backpropagation. We evaluate our method on the PlacesAudio dataset and show promising results for boosting retrieval performance at each phase. By exposing the model to substantially more negatives per effective batch, we achieve higher accuracy at a faster speed than optimizer-stepping baselines. Ultimately, this approach that scales batch size by hundreds of times can be integrated into any contrastive learning framework for more robust representation learning and abundant negative sampling.
Date issued
2025-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology