Adversarial Learned Soups: neural network averaging for joint clean and robust performance

Huang, Brian

Author(s)

Huang, Brian

DownloadThesis PDF (3.823Mb)

Advisor

Mądry, Aleksander

Terms of use

In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/

Metadata

Show full item record

Abstract

To make computer vision models more adversarially robust, recent literature has made various additions to the adversarial training process, from alternative adversarial losses to data augmentations to the usage of large numbers of diffusion-generated synthetic samples. However, models trained for adversarial robustness often face an inherent tradeoff between performance on clean images and performance against adversarial attacks. Methods that primarily seek to boost adversarial robustness may not optimize for the best combined performance along the clean-vs.-adversarial tradeoff. We devise a method to finetune adversarially trained models for combined clean and robust performance, borrowing from the method of "model soups," where parameters within an ensemble of finetuned checkpoints are averaged to form new model weights. Such model soups have been shown to improve performance in transfer learning settings while maintaining or improving the original task performance; extending from this observation, we find that linear interpolation of adversarially robust ensemble parameters reaps similar benefits in the tradeoff between robustness and clean accuracy. Furthermore, we construct a wrapper architecture, or "learned soup," to adversarially train our interpolation coefficients for model soups, and find that, in some cases, directly training the souping coefficients leads to a more robust model than grid-searching for the coefficients. This method of adversarial learned soups can be applied in conjunction with existing methods for adversarial training, further bolstering the current arsenal of defenses against adversarial attacks.

Date issued

2023-09

URI

https://hdl.handle.net/1721.1/152744

Department

Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science

Publisher

Massachusetts Institute of Technology

Collections

Graduate Theses