Annotation Artifacts in Natural Language Inference Data

Annotation Artifacts in Natural Language Inference Data

16 Apr 2018 | Suchin Gururangan, Swabha Swayamdipta, Omer Levy, Roy Schwartz, Samuel R. Bowman, Noah A. Smith
Annotation artifacts in natural language inference (NLI) data refer to patterns in the hypotheses that allow classifiers to predict the correct label without considering the premise. This study shows that a simple text categorization model can correctly classify hypotheses in about 67% of SNLI and 53% of MultiNLI datasets. These artifacts are linked to specific linguistic phenomena such as negation and vagueness, which are highly correlated with certain inference classes. The findings suggest that the success of NLI models has been overestimated, and that the task remains challenging. The study reveals that annotation strategies and heuristics used by crowd workers lead to patterns in the data. For example, entailed hypotheses often use gender-neutral references, neutral hypotheses include purpose clauses, and negation is associated with contradiction. These patterns are evident in the SNLI dataset, where a single instance demonstrates all three phenomena. The study re-evaluates high-performing NLI models on the subset of examples where the hypothesis-only classifier failed, which are considered "hard." The results show that these models perform significantly worse on the "hard" subset compared to the rest of the data. This indicates that NLI models may be relying heavily on annotation artifacts rather than true semantic understanding. The study also discusses the implications of these findings. It suggests that annotation artifacts may inflate model performance, as many examples can be solved by relying on these artifacts alone. The study encourages the development of new benchmarks that exclude easy-to-exploit artifacts to better evaluate NLI models. The results highlight the importance of considering annotation biases when creating and evaluating NLI datasets.Annotation artifacts in natural language inference (NLI) data refer to patterns in the hypotheses that allow classifiers to predict the correct label without considering the premise. This study shows that a simple text categorization model can correctly classify hypotheses in about 67% of SNLI and 53% of MultiNLI datasets. These artifacts are linked to specific linguistic phenomena such as negation and vagueness, which are highly correlated with certain inference classes. The findings suggest that the success of NLI models has been overestimated, and that the task remains challenging. The study reveals that annotation strategies and heuristics used by crowd workers lead to patterns in the data. For example, entailed hypotheses often use gender-neutral references, neutral hypotheses include purpose clauses, and negation is associated with contradiction. These patterns are evident in the SNLI dataset, where a single instance demonstrates all three phenomena. The study re-evaluates high-performing NLI models on the subset of examples where the hypothesis-only classifier failed, which are considered "hard." The results show that these models perform significantly worse on the "hard" subset compared to the rest of the data. This indicates that NLI models may be relying heavily on annotation artifacts rather than true semantic understanding. The study also discusses the implications of these findings. It suggests that annotation artifacts may inflate model performance, as many examples can be solved by relying on these artifacts alone. The study encourages the development of new benchmarks that exclude easy-to-exploit artifacts to better evaluate NLI models. The results highlight the importance of considering annotation biases when creating and evaluating NLI datasets.
Reach us at info@study.space