Show simple item record

dc.contributor.authorZhang, Zhengyan
dc.contributor.authorXiao, Guangxuan
dc.contributor.authorLi, Yongwei
dc.contributor.authorLv, Tian
dc.contributor.authorQi, Fanchao
dc.contributor.authorLiu, Zhiyuan
dc.contributor.authorWang, Yasheng
dc.contributor.authorJiang, Xin
dc.contributor.authorSun, Maosong
dc.date.accessioned2024-07-16T15:26:52Z
dc.date.available2024-07-16T15:26:52Z
dc.date.issued2023-03-02
dc.identifier.issn2731-538X
dc.identifier.issn2731-5398
dc.identifier.urihttps://hdl.handle.net/1721.1/155692
dc.description.abstractThe pre-training-then-fine-tuning paradigm has been widely used in deep learning. Due to the huge computation cost for pre-training, practitioners usually download pre-trained models from the Internet and fine-tune them on downstream datasets, while the downloaded models may suffer backdoor attacks. Different from previous attacks aiming at a target task, we show that a backdoored pre-trained model can behave maliciously in various downstream tasks without foreknowing task information. Attackers can restrict the output representations (the values of output neurons) of trigger-embedded samples to arbitrary predefined values through additional training, namely neuron-level backdoor attack (NeuBA). Since fine-tuning has little effect on model parameters, the fine-tuned model will retain the backdoor functionality and predict a specific label for the samples embedded with the same trigger. To provoke multiple labels in a specific task, attackers can introduce several triggers with predefined contrastive values. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA can well control the predictions for trigger-embedded instances with different trigger designs. Our findings sound a red alarm for the wide use of pre-trained models. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising technique to resist NeuBA by omitting backdoored neurons.en_US
dc.publisherSpringer Science and Business Media LLCen_US
dc.relation.isversionof10.1007/s11633-022-1377-5en_US
dc.rightsCreative Commons Attributionen_US
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/en_US
dc.sourceSpringer Berlin Heidelbergen_US
dc.titleRed Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacksen_US
dc.typeArticleen_US
dc.identifier.citationZhang, Z., Xiao, G., Li, Y. et al. Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-level Backdoor Attacks. Mach. Intell. Res. 20, 180–193 (2023).en_US
dc.relation.journalMachine Intelligence Researchen_US
dc.identifier.mitlicensePUBLISHER_CC
dc.eprint.versionFinal published versionen_US
dc.type.urihttp://purl.org/eprint/type/JournalArticleen_US
eprint.statushttp://purl.org/eprint/status/PeerRevieweden_US
dc.date.updated2024-07-14T03:17:06Z
dc.language.rfc3066en
dc.rights.holderThe Author(s), corrected publication
dspace.embargo.termsN
dspace.date.submission2024-07-14T03:17:06Z
mit.journal.volume20en_US
mit.journal.issue2en_US
mit.licensePUBLISHER_CC
mit.metadata.statusAuthority Work and Publication Information Neededen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record