Search

Now showing items 1-1 of 1

Loss landscape: SGD can have a better view than GD

Poggio, Tomaso; Cooper, Yaim (Center for Brains, Minds and Machines (CBMM), 2020-07-01)

Consider a loss function L = 􏰀ni=1 l2i with li = f(xi) − yi, where f(x) is a deep feedforward network with R layers, no bias terms and scalar output. Assume the network is overparametrized that is, d >> n, where d is the ...