Search
Now showing items 1-6 of 6
SGD and Weight Decay Provably Induce a Low-Rank Bias in Deep Neural Networks
(Center for Brains, Minds and Machines (CBMM), 2023-02-14)
In this paper, we study the bias of Stochastic Gradient Descent (SGD) to learn low-rank weight matrices when training deep ReLU neural networks. Our results show that training neural networks with mini-batch SGD and weight ...
Feature learning in deep classifiers through Intermediate Neural Collapse
(Center for Brains, Minds and Machines (CBMM), 2023-02-27)
In this paper, we conduct an empirical study of the feature learning process in deep classifiers. Recent research has identified a training phenomenon called Neural Collapse (NC), in which the top-layer feature embeddings ...
Skip Connections Increase the Capacity of Associative Memories in Variable Binding Mechanisms
(Center for Brains, Minds and Machines (CBMM), 2023-06-27)
The flexibility of intelligent behavior is fundamentally attributed to the ability to separate and assign structural information from content in sensory inputs. Variable binding is the atomic computation that underlies ...
A Homogeneous Transformer Architecture
(Center for Brains, Minds and Machines (CBMM), 2023-09-18)
While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as ...
Norm-Based Generalization Bounds for Compositionally Sparse Neural Network
(Center for Brains, Minds and Machines (CBMM), 2023-02-14)
In this paper, we investigate the Rademacher complexity of deep sparse neural networks, where each neuron receives a small number of inputs. We prove generalization bounds for multilayered sparse ReLU neural networks, ...
The Janus effects of SGD vs GD: high noise and low rank
(2023-12-21)
It was always obvious that SGD has higher fluctuations at convergence than GD. It has also been often reported that SGD in deep RELU networks has a low-rank bias in the weight matrices. A recent theoretical analysis linked ...