dc.contributor.author | Yun, Chulee | |
dc.contributor.author | Sra, Suvrit | |
dc.contributor.author | Jadbabaie, Ali | |
dc.date.accessioned | 2021-11-05T13:44:44Z | |
dc.date.available | 2021-11-05T13:44:44Z | |
dc.date.issued | 2019 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/137454 | |
dc.description.abstract | © 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. We investigate the loss surface of neural networks. We prove that even for one-hidden-layer networks with “slightest” nonlinearity, the empirical risks have spurious local minima in most cases. Our results thus indicate that in general “no spurious local minima” is a property limited to deep linear networks, and insights obtained from linear networks may not be robust. Specifically, for ReLU(-like) networks we constructively prove that for almost all practical datasets there exist infinitely many local minima. We also present a counterexample for more general activations (sigmoid, tanh, arctan, ReLU, etc.), for which there exists a bad local minimum. Our results make the least restrictive assumptions relative to existing results on spurious local optima in neural networks. We complete our discussion by presenting a comprehensive characterization of global optimality for deep linear networks, which unifies other results on this topic. | en_US |
dc.language.iso | en | |
dc.rights | Creative Commons Attribution-Noncommercial-Share Alike | en_US |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-sa/4.0/ | en_US |
dc.source | arXiv | en_US |
dc.title | Small nonlinearities in activation functions create bad local minima in neural networks | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Yun, Chulee, Sra, Suvrit and Jadbabaie, Ali. 2019. "Small nonlinearities in activation functions create bad local minima in neural networks." 7th International Conference on Learning Representations, ICLR 2019. | |
dc.contributor.department | Massachusetts Institute of Technology. Laboratory for Information and Decision Systems | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science | |
dc.contributor.department | Massachusetts Institute of Technology. Department of Civil and Environmental Engineering | |
dc.contributor.department | Massachusetts Institute of Technology. Institute for Data, Systems, and Society | |
dc.relation.journal | 7th International Conference on Learning Representations, ICLR 2019 | en_US |
dc.eprint.version | Original manuscript | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2021-04-12T17:31:46Z | |
dspace.orderedauthors | Yun, C; Sra, S; Jadbabaie, A | en_US |
dspace.date.submission | 2021-04-12T17:31:47Z | |
mit.license | OPEN_ACCESS_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |