24 Sep 2018 | Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani B. Srivastava, Kai-Wei Chang
The paper "Generating Natural Language Adversarial Examples" by Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani B. Srivastava, and Kai-Wei Chang explores the generation of adversarial examples in the natural language domain. The authors use a black-box population-based optimization algorithm to create semantically and syntactically similar perturbations that can cause well-trained sentiment analysis and textual entailment models to misclassify. They achieve success rates of 97% and 70%, respectively, for sentiment analysis and textual entailment tasks. The adversarial examples are found to be perceptibly similar to the original texts, as validated by a human study where 92.3% of the examples were correctly classified by 20 human annotators. The authors also attempt to defend against these attacks using adversarial training but find it ineffective, highlighting the strength and diversity of their generated adversarial examples. The paper aims to encourage further research into improving the robustness of deep neural networks in the natural language domain.The paper "Generating Natural Language Adversarial Examples" by Moustafa Alzantot, Yash Sharma, Ahmed Elgohary, Bo-Jhang Ho, Mani B. Srivastava, and Kai-Wei Chang explores the generation of adversarial examples in the natural language domain. The authors use a black-box population-based optimization algorithm to create semantically and syntactically similar perturbations that can cause well-trained sentiment analysis and textual entailment models to misclassify. They achieve success rates of 97% and 70%, respectively, for sentiment analysis and textual entailment tasks. The adversarial examples are found to be perceptibly similar to the original texts, as validated by a human study where 92.3% of the examples were correctly classified by 20 human annotators. The authors also attempt to defend against these attacks using adversarial training but find it ineffective, highlighting the strength and diversity of their generated adversarial examples. The paper aims to encourage further research into improving the robustness of deep neural networks in the natural language domain.