Task-Agnostic Detector for Insertion-Based Backdoor Attacks

Task-Agnostic Detector for Insertion-Based Backdoor Attacks

25 Mar 2024 | Weimin Lyu, Xiao Lin, Songzhu Zheng, Lu Pang, Haibin Ling, Susmit Jha, Chao Chen
This paper introduces TABDet, a task-agnostic backdoor detector for natural language processing (NLP) tasks. Traditional backdoor detection methods are task-specific and struggle with diverse NLP tasks like question answering and named entity recognition. TABDet leverages final layer logits combined with an efficient pooling technique to create unified logit representations across three prominent NLP tasks: sentence classification, question answering, and named entity recognition. It can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods. TABDet's key contributions include relying solely on final layer logits for detection, proposing an efficient logits pooling method to refine and unify logit representations across different NLP tasks, and training a backdoor detector using logit representations to fully learn from models of different tasks and achieve superior performance. Empirical results show that TABDet has strong detection power across various NLP tasks, including sentence classification, question answering, and named entity recognition. It can also fully exploit a collection of sample models for different tasks to achieve superior detection performance. The paper also discusses related work, including insertion-based textual backdoor attacks and detection methods against textual backdoor attacks. It presents the TABDet framework, which includes logit features extraction, representation refinement, and backdoor detection. The framework is evaluated on three NLP tasks and shows superior performance compared to existing detection methods. The paper also discusses limitations of the proposed method, including its effectiveness against standard insertion-based attacks and the need for further research on more advanced textual backdoor attacks. The paper concludes that TABDet is a robust and effective backdoor detector for NLP tasks.This paper introduces TABDet, a task-agnostic backdoor detector for natural language processing (NLP) tasks. Traditional backdoor detection methods are task-specific and struggle with diverse NLP tasks like question answering and named entity recognition. TABDet leverages final layer logits combined with an efficient pooling technique to create unified logit representations across three prominent NLP tasks: sentence classification, question answering, and named entity recognition. It can jointly learn from diverse task-specific models, demonstrating superior detection efficacy over traditional task-specific methods. TABDet's key contributions include relying solely on final layer logits for detection, proposing an efficient logits pooling method to refine and unify logit representations across different NLP tasks, and training a backdoor detector using logit representations to fully learn from models of different tasks and achieve superior performance. Empirical results show that TABDet has strong detection power across various NLP tasks, including sentence classification, question answering, and named entity recognition. It can also fully exploit a collection of sample models for different tasks to achieve superior detection performance. The paper also discusses related work, including insertion-based textual backdoor attacks and detection methods against textual backdoor attacks. It presents the TABDet framework, which includes logit features extraction, representation refinement, and backdoor detection. The framework is evaluated on three NLP tasks and shows superior performance compared to existing detection methods. The paper also discusses limitations of the proposed method, including its effectiveness against standard insertion-based attacks and the need for further research on more advanced textual backdoor attacks. The paper concludes that TABDet is a robust and effective backdoor detector for NLP tasks.
Reach us at info@study.space