Superneurons: dynamic GPU memory management for training deep neural networks

Wang, Linnan; Ye, Jinmian; Zhao, Yiyang; Wu, Wei; Li, Ang; Song, Shuaiwen Leon; Xu, Zenglin; Kraska, Tim

Notice

This is not the latest version of this item. The latest version can be found at:https://dspace.mit.edu/handle/1721.1/132270.2

Show simple item record

dc.contributor.author	Wang, Linnan
dc.contributor.author	Ye, Jinmian
dc.contributor.author	Zhao, Yiyang
dc.contributor.author	Wu, Wei
dc.contributor.author	Li, Ang
dc.contributor.author	Song, Shuaiwen Leon
dc.contributor.author	Xu, Zenglin
dc.contributor.author	Kraska, Tim
dc.date.accessioned	2021-09-20T18:21:35Z
dc.date.available	2021-09-20T18:21:35Z
dc.identifier.uri	https://hdl.handle.net/1721.1/132270
dc.description.abstract	© 2018 ACM. Going deeper and wider in neural architectures improves their accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need to change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, Liveness Analysis, Unified Tensor Pool, and Cost-Aware Recomputation; together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in these memory-saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has 104 basic network layers on a 12GB K40c.	en_US
dc.language.iso	en
dc.publisher	Association for Computing Machinery (ACM)	en_US
dc.relation.isversionof	10.1145/3178487.3178491	en_US
dc.rights	Creative Commons Attribution-Noncommercial-Share Alike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	arXiv	en_US
dc.title	Superneurons: dynamic GPU memory management for training deep neural networks	en_US
dc.type	Article	en_US
dc.relation.journal	ACM SIGPLAN Notices	en_US
dc.eprint.version	Original manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2021-01-11T14:43:22Z
dspace.orderedauthors	Wang, L; Ye, J; Zhao, Y; Wu, W; Li, A; Song, SL; Xu, Z; Kraska, T	en_US
dspace.date.submission	2021-01-11T14:43:27Z
mit.journal.volume	53	en_US
mit.journal.issue	1	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed

Files in this item

Name:: 1801.04380.pdf
Size:: 1.340Mb
Format:: PDF
Description:: Accepted version

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record

Version	Item	Date	Summary
2	1721.1/132270.2	2022-08-03T15:49:07Z	Metadata changed: Verified or entered author name and department authority metadata.
1	1721.1/132270*	2021-09-20T18:21:35Z

*Selected version

DSpace@MIT

Notice

Superneurons: dynamic GPU memory management for training deep neural networks

Files in this item

This item appears in the following Collection(s)

Version History