dc.contributor.author | Yun, Chulhee | |
dc.contributor.author | Sra, Suvrit | |
dc.contributor.author | Jadbabaie, Ali | |
dc.date.accessioned | 2021-11-05T14:27:01Z | |
dc.date.available | 2021-11-05T14:27:01Z | |
dc.date.issued | 2019 | |
dc.identifier.uri | https://hdl.handle.net/1721.1/137480 | |
dc.description.abstract | © 2019 Neural information processing systems foundation. All rights reserved. We study finite sample expressivity, i.e., memorization power of ReLU networks. Recent results require N hidden nodes to memorize/interpolate arbitrary N data points. In contrast, by exploiting depth, we show that 3-layer ReLU networks with ?(vN) hidden nodes can perfectly memorize most datasets with N points. We also prove that width T(vN) is necessary and sufficient for memorizing N data points, proving tight bounds on memorization capacity. The sufficiency result can be extended to deeper networks; we show that an L-layer network with W parameters in the hidden layers can memorize N data points if W = ?(N). Combined with a recent upper bound O(WLlog W) on VC dimension, our construction is nearly tight for any fixed L. Subsequently, we analyze memorization capacity of residual networks under a general position assumption; we prove results that substantially reduce the known requirement of N hidden nodes. Finally, we study the dynamics of stochastic gradient descent (SGD), and show that when initialized near a memorizing global minimum of the empirical risk, SGD quickly finds a nearby point with much smaller empirical risk. | en_US |
dc.language.iso | en | |
dc.relation.isversionof | https://papers.nips.cc/paper/2019/hash/dbea3d0e2a17c170c412c74273778159-Abstract.html | en_US |
dc.rights | Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. | en_US |
dc.source | Neural Information Processing Systems (NIPS) | en_US |
dc.title | Small ReLU networks are powerful memorizers: A tight analysis of memorization capacity | en_US |
dc.type | Article | en_US |
dc.identifier.citation | Yun, Chulhee, Sra, Suvrit and Jadbabaie, Ali. 2019. "Small ReLU networks are powerful memorizers: A tight analysis of memorization capacity." Advances in Neural Information Processing Systems, 32. | |
dc.contributor.department | Massachusetts Institute of Technology. Laboratory for Information and Decision Systems | |
dc.contributor.department | Massachusetts Institute of Technology. Institute for Data, Systems, and Society | |
dc.relation.journal | Advances in Neural Information Processing Systems | en_US |
dc.eprint.version | Final published version | en_US |
dc.type.uri | http://purl.org/eprint/type/ConferencePaper | en_US |
eprint.status | http://purl.org/eprint/status/NonPeerReviewed | en_US |
dc.date.updated | 2021-03-25T18:14:08Z | |
dspace.orderedauthors | Yun, C; Sra, S; Jadbabaie, A | en_US |
dspace.date.submission | 2021-03-25T18:14:10Z | |
mit.journal.volume | 32 | en_US |
mit.license | PUBLISHER_POLICY | |
mit.metadata.status | Authority Work and Publication Information Needed | en_US |