Search
Now showing items 1-1 of 1
Loss landscape: SGD can have a better view than GD
(Center for Brains, Minds and Machines (CBMM), 2020-07-01)
Consider a loss function L = ni=1 l2i with li = f(xi) − yi, where f(x) is a deep feedforward network with R layers, no bias terms and scalar output. Assume the network is overparametrized that is, d >> n, where d is the ...