Dynamics of Gradient Flow with Contrastive Learning

Tepe, Cem

Author(s)

Tepe, Cem

DownloadThesis PDF (2.534Mb)

Advisor

Azizan, Navid

Terms of use

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) Copyright retained by author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/

Metadata

Show full item record

Abstract

Contrastive learning (CL), in di erent forms, has been shown to learn discriminatory representations for downstream tasks without the need of human labeling. In the representation space learnt via CL, each class collapses to a distinct vertex of a simplex on a hypersphere during training. This property, also seen in other types of learning tasks, might explain why CL works as well as it does. Having class collapse on the test distribution, which determines how well the model generalizes to new samples and new classes, is tied to class collapse on the training distribution under certain conditions as studied by Galanti et al. (2022). In the case of CL, minimizing the contrastive loss has been shown to lead to collapse during training by Graf et al. (2021). In a recent study, Xue et al. (2023) show that the minimizing the contrastive loss is not enough to observe class collapse in the representation space for a single layer linear model and that we need minimum norm minimizers for the collapse to happen. However, their results don't explain how class collapse can occur without adding an explicit bias. The implicit bias of the gradient descent is a likely candidate to explain this phenomena. Here, we investigate the gradient ow of the spectral contrastive loss and give a theoretical description of the learning dynamics.

Date issued

2024-05

URI

https://hdl.handle.net/1721.1/157033

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses