Recently added

The Janus effects of SGD vs GD: high noise and low rank

Xu, Mengjia; Galanti, Tomer; Rangamani, Akshay; Rosasco, Lorenzo; Poggio, Tomaso (2023-12-21)

It was always obvious that SGD has higher fluctuations at convergence than GD. It has also been often reported that SGD in deep RELU networks has a low-rank bias in the weight matrices. A recent theoretical analysis linked ...

A Homogeneous Transformer Architecture

Gan, Yulu; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2023-09-18)

While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as ...

Skip Connections Increase the Capacity of Associative Memories in Variable Binding Mechanisms

Xie, Yi; Li, Yichen; Rangamani, Akshay (Center for Brains, Minds and Machines (CBMM), 2023-06-27)

The flexibility of intelligent behavior is fundamentally attributed to the ability to separate and assign structural information from content in sensory inputs. Variable binding is the atomic computation that underlies ...

DSpace@MIT

CBMM Memo Series: Recent submissions

The Janus effects of SGD vs GD: high noise and low rank

A Homogeneous Transformer Architecture

Skip Connections Increase the Capacity of Associative Memories in Variable Binding Mechanisms

CBMM Memo Series: Recent submissions

The Janus effects of SGD vs GD: high noise and low rank ﻿

A Homogeneous Transformer Architecture ﻿

Skip Connections Increase the Capacity of Associative Memories in Variable Binding Mechanisms ﻿

The Janus effects of SGD vs GD: high noise and low rank

A Homogeneous Transformer Architecture

Skip Connections Increase the Capacity of Associative Memories in Variable Binding Mechanisms