Scaling contrastive learning batch size by two orders ofmagnitude

Tian, Betsy

Author(s)

Tian, Betsy

DownloadThesis PDF (655.0Kb)

Advisor

Freeman, William

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

Contrastive learning has emerged as a powerful framework for unsupervised representation learning, allowing models to learn by maximizing agreement between related samples and distinguishing dissimilar ones. However, contrastive learning frameworks are fundamentally limited by the number of negative pairs a model can observe, and memory-intensive backbones constrain practical batch sizes. We introduce a three-phase, adapter-augmented training framework that scales contrastive batch sizes by two orders of magnitude – surpassing previous state-of-the-art learners in both accuracy and speed. First, we co-train the backbone and adapter on small batches to establish a strong initialization. Next, we freeze the backbone and train the adapter alone with very large batches, exposing it to an enlarged negative pool. Finally, we transfer large-batch adapter gradients back into the backbone via segmented backpropagation. We evaluate our method on the PlacesAudio dataset and show promising results for boosting retrieval performance at each phase. By exposing the model to substantially more negatives per effective batch, we achieve higher accuracy at a faster speed than optimizer-stepping baselines. Ultimately, this approach that scales batch size by hundreds of times can be integrated into any contrastive learning framework for more robust representation learning and abundant negative sampling.

Date issued

2025-05

URI

https://hdl.handle.net/1721.1/162953

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses