通过归因防御插入式文本后门攻击（英文版）

发布者：wx****9f

2023-05-06

726 KB 15 页

人工智能（AI）

文件列表：

通过归因防御插入式文本后门攻击【英文版】.pdf

下载文档

资源简介

英文标题：Defending against Insertion-based Textual Backdoor Attacks via Attribution中文摘要：提出了 AttDef 模型，该模型基于属性和预训练语言模型，可以有效防御 BadNL 和 InSent 两种插入型中毒攻击，其中通过属性分析将大于特定阈值的词作为潜在的触发器，同时利用外部预训练语言模型鉴别是否有毒，该方法在四个基准数据集上实现了最新的预测恢复能力表现。英文摘要：Textual backdoor attack, as a novel attack model, has been shown to beeffective in adding a backdoor to the model during training. Defending againstsuch backdoor attacks has become urgent and important. In this paper, wepropose AttDef, an efficient attribution-based pipelin

加载中...

已阅读到文档的结尾了

下载文档