×
img

通过归因防御插入式文本后门攻击(英文版)

发布者:wx****9f
2023-05-06
726 KB 15 页
人工智能(AI)
文件列表:
通过归因防御插入式文本后门攻击【英文版】.pdf
下载文档
英文标题:Defending against Insertion-based Textual Backdoor Attacks via Attribution中文摘要:提出了 AttDef 模型,该模型基于属性和预训练语言模型,可以有效防御 BadNL 和 InSent 两种插入型中毒攻击, 其中通过属性分析将大于特定阈值的词作为潜在的触发器,同时利用外部预训练语言模型鉴别是否有毒,该方法在四个基准数据集上实现了最新的预测恢复能力表现。英文摘要:Textual backdoor attack, as a novel attack model, has been shown to beeffective in adding a backdoor to the model during training. Defending againstsuch backdoor attacks has become urgent and important. In this paper, wepropose AttDef, an efficient attribution-based pipelin

加载中...

已阅读到文档的结尾了

下载文档

网友评论>