MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
  • DSpace@MIT Home
  • MIT Open Access Articles
  • MIT Open Access Articles
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

Author(s)
Wang, Tianzhe; Wang, Kuan; Cai, Han; Lin, Ji; Liu, Zhijian; Han, Song; ... Show more Show less
Thumbnail
DownloadAccepted version (1.874Mb)
Open Access Policy

Open Access Policy

Creative Commons Attribution-Noncommercial-Share Alike

Terms of use
Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/
Metadata
Show full item record
Abstract
We present APQ, a novel design methodology for efficient deep learning deployment. Unlike previous methods that separately optimize the neural network architecture, pruning policy, and quantization policy, we design to optimize them in a joint manner. To deal with the larger design space it brings, we devise to train a quantization-aware accuracy predictor that is fed to the evolutionary search to select the best fit. Since directly training such a predictor requires time-consuming quantization data collection, we propose to use predictor-transfer technique to get the quantization-aware predictor: we first generate a large dataset of ≺NN architecture, ImageNet accuracy≻ pairs by sampling a pretrained unified once-for-all network and doing direct evaluation; then we use these data to train an accuracy predictor without quantization, followed by transferring its weights to train the quantization-aware predictor, which largely reduces the quantization data collection time. Extensive experiments on ImageNet show the benefits of this joint design methodology: the model searched by our method maintains the same level accuracy as ResNet34 8-bit model while saving 8× BitOps; we achieve 2×/1.3× latency/energy saving compared to MobileNetV2+HAQ [30, 36] while obtaining the same level accuracy; the marginal search cost of joint optimization for a new deployment scenario outperforms separate optimizations using ProxylessNAS+AMC+HAQ [5, 12, 36] by 2.3% accuracy while reducing orders of magnitude GPU hours and CO2 emission with respect to the training cost.
Date issued
2020-06
URI
https://hdl.handle.net/1721.1/129496
Department
Massachusetts Institute of Technology. Microsystems Technology Laboratories; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Journal
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Wang, Tianzhe et al. “APQ: Joint Search for Network Architecture, Pruning and Quantization Policy.” Paper in the Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, Virtual, 14-19 June 2020, IEEE: 2078-2087 © 2020 The Author(s)
Version: Author's final manuscript
ISBN
9781728171685

Collections
  • MIT Open Access Articles

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.