Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks

Zhang, Zhengyan; Xiao, Guangxuan; Li, Yongwei; Lv, Tian; Qi, Fanchao; Liu, Zhiyuan; Wang, Yasheng; Jiang, Xin; Sun, Maosong

Author(s)

Zhang, Zhengyan; Xiao, Guangxuan; Li, Yongwei; Lv, Tian; Qi, Fanchao; ... Show more

Download11633_2022_Article_1377.pdf (944.4Kb)

Publisher with Creative Commons License

Terms of use

Creative Commons Attribution https://creativecommons.org/licenses/by/4.0/

Metadata

Show full item record

Abstract

The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets, while the downloaded models may suffer backdoor attacks. Different from previous attacks aiming at a target task, we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information. Attackers can restrict the output representations (the values of output neurons) of trigger-embedded samples to arbitrary predefined values through additional training, namely neuron-level backdoor attack (NeuBA). Since fine-tuning has little effect on model parameters, the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger. To provoke multiple labels in a specific task, attackers can introduce several triggers with predefined contrastive values. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs. Our findings sound a red alarm for the wide use of pre-trained models. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.

Date issued

2023-03-02

URI

https://hdl.handle.net/1721.1/155692

Journal

Machine Intelligence Research

Publisher

Springer Science and Business Media LLC

Citation

Zhang, Z., Xiao, G., Li, Y. et al. Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks. Mach. Intell. Res. 20, 180–193 (2023).

Version: Final published version

ISSN

2731-538X

2731-5398

Collections

MIT Open Access Articles