22 Apr 2024 | Avinash Anand, Kritarth Prasad, Ujjwal Goel, Mohit Gupta, Naman Lal, Astha Verma, and Rajiv Ratn Shah
This paper introduces a method for generating multi-sentence citation text using large language models (LLMs). The approach involves a source paper and a collection of target papers, resulting in a coherent paragraph with multiple citations. A new dataset, MCG-S2ORC, is proposed, containing multiple citation instances from computer science research papers. Three LLMs—LLaMA, Alpaca, and Vicuna—are evaluated for citation generation. The study shows that integrating knowledge graphs from target papers into prompts significantly improves performance. The research highlights the potential of LLMs in citation generation, enabling more accurate and context-aware citations. The methodology includes fine-tuning LLMs, creating a dataset, and incorporating knowledge graphs to enhance citation generation. Experiments demonstrate that Vicuna outperforms other models, and the integration of knowledge graphs improves the quality and coherence of generated citations. The paper also discusses limitations, such as token length restrictions, and acknowledges the support received for the research. The results show that the proposed method effectively generates multi-sentence citations, enhancing the accuracy and relevance of scientific literature citations.This paper introduces a method for generating multi-sentence citation text using large language models (LLMs). The approach involves a source paper and a collection of target papers, resulting in a coherent paragraph with multiple citations. A new dataset, MCG-S2ORC, is proposed, containing multiple citation instances from computer science research papers. Three LLMs—LLaMA, Alpaca, and Vicuna—are evaluated for citation generation. The study shows that integrating knowledge graphs from target papers into prompts significantly improves performance. The research highlights the potential of LLMs in citation generation, enabling more accurate and context-aware citations. The methodology includes fine-tuning LLMs, creating a dataset, and incorporating knowledge graphs to enhance citation generation. Experiments demonstrate that Vicuna outperforms other models, and the integration of knowledge graphs improves the quality and coherence of generated citations. The paper also discusses limitations, such as token length restrictions, and acknowledges the support received for the research. The results show that the proposed method effectively generates multi-sentence citations, enhancing the accuracy and relevance of scientific literature citations.