Now showing items 7-9 of 156

    • The Janus effects of SGD vs GD: high noise and low rank 

      Xu, Mengjia; Galanti, Tomer; Rangamani, Akshay; Rosasco, Lorenzo; Poggio, Tomaso (2023-12-21)
      It was always obvious that SGD has higher fluctuations at convergence than GD. It has also been often reported that SGD in deep RELU networks has a low-rank bias in the weight matrices. A recent theoretical analysis linked ...
    • A Homogeneous Transformer Architecture 

      Gan, Yulu; Poggio, Tomaso (Center for Brains, Minds and Machines (CBMM), 2023-09-18)
      While the Transformer architecture has made a substantial impact in the field of machine learning, it is unclear what purpose each component serves in the overall architecture. Heterogeneous nonlinear circuits such as ...
    • Skip Connections Increase the Capacity of Associative Memories in Variable Binding Mechanisms 

      Xie, Yi; Li, Yichen; Rangamani, Akshay (Center for Brains, Minds and Machines (CBMM), 2023-06-27)
      The flexibility of intelligent behavior is fundamentally attributed to the ability to separate and assign structural information from content in sensory inputs. Variable binding is the atomic computation that underlies ...