DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

DIALECTBENCH: A NLP Benchmark for Dialects, Varieties, and Closely-Related Languages

7 Jul 2024 | Fahim Faisal, Orevao Ghene Ahia, Aarohi Srivastava, Kabir Ahuja, David Chiang, Yulia Tsvetkov, Antonios Anastasopoulos
DIALECTBENCH is a new large-scale benchmark for evaluating natural language processing (NLP) systems on dialects, varieties, and closely-related languages. It includes 281 varieties across 10 tasks, covering a wide range of language variations. The benchmark aims to assess the performance of NLP systems on different language varieties, highlighting the disparities between standard and non-standard varieties. It also identifies language clusters with larger performance differences across tasks. DIALECTBENCH provides a comprehensive view of the current state of NLP for language varieties and helps advance the field by identifying areas for improvement. The benchmark includes a variety of tasks such as dependency parsing, named entity recognition, sentiment analysis, and machine translation. It also includes a translate-test evaluation dataset for natural language inference. The benchmark uses a cluster-variety mapping approach to group related languages and varieties. It defines performance gap metrics to quantify the differences in performance across varieties and clusters. The benchmark also includes a variety of evaluation methods, including zero-shot transfer, fine-tuning, and in-context learning. The results show that low-resource varieties often perform worse than high-resource ones, and that performance gaps are more pronounced when moving from zero-shot to fine-tuning. The benchmark also highlights the importance of considering demographic and linguistic utility in evaluating NLP systems. The study concludes that DIALECTBENCH provides a valuable resource for evaluating NLP systems on dialects and varieties, and that further research is needed to improve performance on low-resource varieties.DIALECTBENCH is a new large-scale benchmark for evaluating natural language processing (NLP) systems on dialects, varieties, and closely-related languages. It includes 281 varieties across 10 tasks, covering a wide range of language variations. The benchmark aims to assess the performance of NLP systems on different language varieties, highlighting the disparities between standard and non-standard varieties. It also identifies language clusters with larger performance differences across tasks. DIALECTBENCH provides a comprehensive view of the current state of NLP for language varieties and helps advance the field by identifying areas for improvement. The benchmark includes a variety of tasks such as dependency parsing, named entity recognition, sentiment analysis, and machine translation. It also includes a translate-test evaluation dataset for natural language inference. The benchmark uses a cluster-variety mapping approach to group related languages and varieties. It defines performance gap metrics to quantify the differences in performance across varieties and clusters. The benchmark also includes a variety of evaluation methods, including zero-shot transfer, fine-tuning, and in-context learning. The results show that low-resource varieties often perform worse than high-resource ones, and that performance gaps are more pronounced when moving from zero-shot to fine-tuning. The benchmark also highlights the importance of considering demographic and linguistic utility in evaluating NLP systems. The study concludes that DIALECTBENCH provides a valuable resource for evaluating NLP systems on dialects and varieties, and that further research is needed to improve performance on low-resource varieties.
Reach us at info@study.space