APQ: Joint Search for Network Architecture, Pruning and Quantization Policy
Author(s)
Wang, Tianzhe; Wang, Kuan; Cai, Han; Lin, Ji; Liu, Zhijian; Han, Song; ... Show more Show less
DownloadAccepted version (1.874Mb)
Open Access Policy
Open Access Policy
Creative Commons Attribution-Noncommercial-Share Alike
Terms of use
Metadata
Show full item recordAbstract
We present APQ, a novel design methodology for efficient deep learning deployment. Unlike previous methods that separately optimize the neural network architecture, pruning policy, and quantization policy, we design to optimize them in a joint manner. To deal with the larger design space it brings, we devise to train a quantization-aware accuracy predictor that is fed to the evolutionary search to select the best fit. Since directly training such a predictor requires time-consuming quantization data collection, we propose to use predictor-transfer technique to get the quantization-aware predictor: we first generate a large dataset of ≺NN architecture, ImageNet accuracy≻ pairs by sampling a pretrained unified once-for-all network and doing direct evaluation; then we use these data to train an accuracy predictor without quantization, followed by transferring its weights to train the quantization-aware predictor, which largely reduces the quantization data collection time. Extensive experiments on ImageNet show the benefits of this joint design methodology: the model searched by our method maintains the same level accuracy as ResNet34 8-bit model while saving 8× BitOps; we achieve 2×/1.3× latency/energy saving compared to MobileNetV2+HAQ [30, 36] while obtaining the same level accuracy; the marginal search cost of joint optimization for a new deployment scenario outperforms separate optimizations using ProxylessNAS+AMC+HAQ [5, 12, 36] by 2.3% accuracy while reducing orders of magnitude GPU hours and CO2 emission with respect to the training cost.
Date issued
2020-06Department
Massachusetts Institute of Technology. Microsystems Technology Laboratories; Massachusetts Institute of Technology. Department of Electrical Engineering and Computer ScienceJournal
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
Wang, Tianzhe et al. “APQ: Joint Search for Network Architecture, Pruning and Quantization Policy.” Paper in the Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2020, Virtual, 14-19 June 2020, IEEE: 2078-2087 © 2020 The Author(s)
Version: Author's final manuscript
ISBN
9781728171685