Adversarial Learned Soups: neural network averaging for joint clean and robust performance
Author(s)
Huang, Brian
DownloadThesis PDF (3.823Mb)
Advisor
Mądry, Aleksander
Terms of use
Metadata
Show full item recordAbstract
To make computer vision models more adversarially robust, recent literature has made various additions to the adversarial training process, from alternative adversarial losses to data augmentations to the usage of large numbers of diffusion-generated synthetic samples. However, models trained for adversarial robustness often face an inherent tradeoff between performance on clean images and performance against adversarial attacks. Methods that primarily seek to boost adversarial robustness may not optimize for the best combined performance along the clean-vs.-adversarial tradeoff. We devise a method to finetune adversarially trained models for combined clean and robust performance, borrowing from the method of "model soups," where parameters within an ensemble of finetuned checkpoints are averaged to form new model weights. Such model soups have been shown to improve performance in transfer learning settings while maintaining or improving the original task performance; extending from this observation, we find that linear interpolation of adversarially robust ensemble parameters reaps similar benefits in the tradeoff between robustness and clean accuracy. Furthermore, we construct a wrapper architecture, or "learned soup," to adversarially train our interpolation coefficients for model soups, and find that, in some cases, directly training the souping coefficients leads to a more robust model than grid-searching for the coefficients. This method of adversarial learned soups can be applied in conjunction with existing methods for adversarial training, further bolstering the current arsenal of defenses against adversarial attacks.
Date issued
2023-09Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer SciencePublisher
Massachusetts Institute of Technology