ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications

ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications

8 Mar 2024 | Sotaro Takeshita, Tommaso Green, Ines Reinig, Kai Eckert, Simone Paolo Ponzetto
ACLSum is a novel multi-aspect summarization dataset designed for scientific publications, particularly in the field of Natural Language Processing (NLP). The dataset is manually crafted and validated by domain experts, addressing the limitations of existing datasets that are often semi-automatically generated and lack ground-truth summaries. ACLSum includes 250 documents, each with three aspects: Challenge, Approach, and Outcome, complemented by manually crafted and validated extractive and abstractive summaries. The dataset is evaluated through extensive experiments using pretrained language models (PLMs) and large language models (LLMs), including end-to-end and extract-then-abstract summarization approaches. The results show that PLMs perform better on the Challenge aspect, requiring higher-level abstraction, while LLMs trained with instruction-tuning outperform those trained with chain-of-thought methods. Additionally, the study evaluates a greedy algorithm for inducing extractive summarization labels, finding it to be less effective compared to manual annotations. The paper also discusses the limitations of the dataset, such as the need for expert annotation and the focus on a single field and language. Future work includes expanding the dataset to other fields and languages and exploring multi-document summarization.ACLSum is a novel multi-aspect summarization dataset designed for scientific publications, particularly in the field of Natural Language Processing (NLP). The dataset is manually crafted and validated by domain experts, addressing the limitations of existing datasets that are often semi-automatically generated and lack ground-truth summaries. ACLSum includes 250 documents, each with three aspects: Challenge, Approach, and Outcome, complemented by manually crafted and validated extractive and abstractive summaries. The dataset is evaluated through extensive experiments using pretrained language models (PLMs) and large language models (LLMs), including end-to-end and extract-then-abstract summarization approaches. The results show that PLMs perform better on the Challenge aspect, requiring higher-level abstraction, while LLMs trained with instruction-tuning outperform those trained with chain-of-thought methods. Additionally, the study evaluates a greedy algorithm for inducing extractive summarization labels, finding it to be less effective compared to manual annotations. The paper also discusses the limitations of the dataset, such as the need for expert annotation and the focus on a single field and language. Future work includes expanding the dataset to other fields and languages and exploring multi-document summarization.
Reach us at info@study.space