Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks

Zhang, Zhengyan; Xiao, Guangxuan; Li, Yongwei; Lv, Tian; Qi, Fanchao; Liu, Zhiyuan; Wang, Yasheng; Jiang, Xin; Sun, Maosong

dc.contributor.author	Zhang, Zhengyan
dc.contributor.author	Xiao, Guangxuan
dc.contributor.author	Li, Yongwei
dc.contributor.author	Lv, Tian
dc.contributor.author	Qi, Fanchao
dc.contributor.author	Liu, Zhiyuan
dc.contributor.author	Wang, Yasheng
dc.contributor.author	Jiang, Xin
dc.contributor.author	Sun, Maosong
dc.date.accessioned	2024-07-16T15:26:52Z
dc.date.available	2024-07-16T15:26:52Z
dc.date.issued	2023-03-02
dc.identifier.issn	2731-538X
dc.identifier.issn	2731-5398
dc.identifier.uri	https://hdl.handle.net/1721.1/155692
dc.description.abstract	The pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets, while the downloaded models may suffer backdoor attacks. Different from previous attacks aiming at a target task, we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information. Attackers can restrict the output representations (the values of output neurons) of trigger-embedded samples to arbitrary predefined values through additional training, namely neuron-level backdoor attack (NeuBA). Since fine-tuning has little effect on model parameters, the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger. To provoke multiple labels in a specific task, attackers can introduce several triggers with predefined contrastive values. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs. Our findings sound a red alarm for the wide use of pre-trained models. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.	en_US
dc.publisher	Springer Science and Business Media LLC	en_US
dc.relation.isversionof	10.1007/s11633-022-1377-5	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Springer Berlin Heidelberg	en_US
dc.title	Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks	en_US
dc.type	Article	en_US
dc.identifier.citation	Zhang, Z., Xiao, G., Li, Y. et al. Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks. Mach. Intell. Res. 20, 180–193 (2023).	en_US
dc.relation.journal	Machine Intelligence Research	en_US
dc.identifier.mitlicense	PUBLISHER_CC
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2024-07-14T03:17:06Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s), corrected publication
dspace.embargo.terms	N
dspace.date.submission	2024-07-14T03:17:06Z
mit.journal.volume	20	en_US
mit.journal.issue	2	en_US
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 11633_2022_Article_1377.pdf
Size:: 944.4Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record