22 Jun 2024 | Tianqing Fang, Zeming Chen, Yangqiu Song, Antoine Bosselut
This paper introduces COM² (COMplex COMmonsense), a new dataset for complex commonsense reasoning derived from commonsense knowledge graphs (CSKGs). COM² is created by sampling multi-hop logical queries from existing CSKGs and verbalizing them into multiple-choice and text generation questions using handcrafted rules and large language models. The dataset includes 790,000 question-answer pairs, with 1,300 manually verified examples. The dataset addresses challenges in generating complex reasoning tasks, including sparsity, quality, and contextualization issues in CSKGs. The paper presents a pipeline for sampling and verbalizing complex logical queries from CSKGs to form a complex commonsense reasoning benchmark. The dataset is used to evaluate the performance of various state-of-the-art language models and question answering models, showing significant improvements in complex reasoning ability. The results demonstrate the effectiveness of the dataset in enhancing commonsense reasoning across eight commonsense reasoning datasets. The paper also discusses the challenges of complex commonsense reasoning and the benefits of fine-tuning models on COM² for zero-shot commonsense reasoning tasks. The dataset is evaluated on multiple downstream tasks, including commonsense question answering and generative commonsense inference, showing its potential for improving commonsense reasoning. The paper concludes that COM² provides a valuable resource for advancing commonsense reasoning research.This paper introduces COM² (COMplex COMmonsense), a new dataset for complex commonsense reasoning derived from commonsense knowledge graphs (CSKGs). COM² is created by sampling multi-hop logical queries from existing CSKGs and verbalizing them into multiple-choice and text generation questions using handcrafted rules and large language models. The dataset includes 790,000 question-answer pairs, with 1,300 manually verified examples. The dataset addresses challenges in generating complex reasoning tasks, including sparsity, quality, and contextualization issues in CSKGs. The paper presents a pipeline for sampling and verbalizing complex logical queries from CSKGs to form a complex commonsense reasoning benchmark. The dataset is used to evaluate the performance of various state-of-the-art language models and question answering models, showing significant improvements in complex reasoning ability. The results demonstrate the effectiveness of the dataset in enhancing commonsense reasoning across eight commonsense reasoning datasets. The paper also discusses the challenges of complex commonsense reasoning and the benefits of fine-tuning models on COM² for zero-shot commonsense reasoning tasks. The dataset is evaluated on multiple downstream tasks, including commonsense question answering and generative commonsense inference, showing its potential for improving commonsense reasoning. The paper concludes that COM² provides a valuable resource for advancing commonsense reasoning research.