Dynamics of Gradient Flow with Contrastive Learning
Author(s)
Tepe, Cem
DownloadThesis PDF (2.534Mb)
Advisor
Azizan, Navid
Terms of use
Metadata
Show full item recordAbstract
Contrastive learning (CL), in di erent forms, has been shown to learn discriminatory representations for downstream tasks without the need of human labeling. In the representation space learnt via CL, each class collapses to a distinct vertex of a simplex on a hypersphere during training. This property, also seen in other types of learning tasks, might explain why CL works as well as it does. Having class collapse on the test distribution, which determines how well the model generalizes to new samples and new classes, is tied to class collapse on the training distribution under certain conditions as studied by Galanti et al. (2022). In the case of CL, minimizing the contrastive loss has been shown to lead to collapse during training by Graf et al. (2021). In a recent study, Xue et al. (2023) show that the minimizing the contrastive loss is not enough to observe class collapse in the representation space for a single layer linear model and that we need minimum norm minimizers for the collapse to happen. However, their results don't explain how class collapse can occur without adding an explicit bias. The implicit bias of the gradient descent is a likely candidate to explain this phenomena. Here, we investigate the gradient ow of the spectral contrastive loss and give a theoretical description of the learning dynamics.
Date issued
2024-05Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology