Understanding AutoSurvey%3A Large Language Models Can Automatically Write Surveys

AutoSurvey is a system that leverages large language models (LLMs) to automatically generate comprehensive literature surveys. The paper introduces AutoSurvey as a systematic approach to address challenges in creating literature surveys, such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks. AutoSurvey involves initial retrieval and outline generation, parallel subsection drafting, integration and refinement, and rigorous evaluation and iteration. The system's contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness. The system is open-sourced at https://github.com/AutoSurveys/AutoSurvey. The paper highlights the growing need for efficient methods to synthesize expanding literature, especially in fast-paced fields like artificial intelligence. The number of papers related to large language models has surged, but traditional human-authored surveys are becoming increasingly difficult to produce due to the sheer volume and complexity of data. AutoSurvey addresses these challenges by using a two-stage generation approach, parallel generation, and a real-time knowledge update mechanism using retrieval-augmented generation (RAG). It also employs a multi-LLM-as-judge strategy for evaluation, which generates initial evaluation metrics using multiple large language models, refined by human experts. Experiments show that AutoSurvey significantly outperforms naive RAG-based LLMs and matches human performance in content and citation quality. AutoSurvey achieves high citation recall and precision scores, and excels in content quality, scoring close to human performance. The system is efficient, with a significantly lower time cost compared to human writing. The results indicate that AutoSurvey provides a balanced trade-off between quality and efficiency, making it a compelling alternative for generating academic surveys. The paper also discusses the limitations of AutoSurvey, including the prevalence of overgeneralization errors in citations, which indicate that LLMs still rely heavily on their parametric knowledge for writing. The system's performance is influenced by the use of different LLMs as the base writer and varying iteration counts. However, AutoSurvey consistently performs well across various configurations, showcasing its robustness and efficiency. The paper concludes that AutoSurvey is the first system to explore the potential of large model agents in writing extensive academic surveys, providing a valuable reference for future related research.AutoSurvey is a system that leverages large language models (LLMs) to automatically generate comprehensive literature surveys. The paper introduces AutoSurvey as a systematic approach to address challenges in creating literature surveys, such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks. AutoSurvey involves initial retrieval and outline generation, parallel subsection drafting, integration and refinement, and rigorous evaluation and iteration. The system's contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness. The system is open-sourced at https://github.com/AutoSurveys/AutoSurvey. The paper highlights the growing need for efficient methods to synthesize expanding literature, especially in fast-paced fields like artificial intelligence. The number of papers related to large language models has surged, but traditional human-authored surveys are becoming increasingly difficult to produce due to the sheer volume and complexity of data. AutoSurvey addresses these challenges by using a two-stage generation approach, parallel generation, and a real-time knowledge update mechanism using retrieval-augmented generation (RAG). It also employs a multi-LLM-as-judge strategy for evaluation, which generates initial evaluation metrics using multiple large language models, refined by human experts. Experiments show that AutoSurvey significantly outperforms naive RAG-based LLMs and matches human performance in content and citation quality. AutoSurvey achieves high citation recall and precision scores, and excels in content quality, scoring close to human performance. The system is efficient, with a significantly lower time cost compared to human writing. The results indicate that AutoSurvey provides a balanced trade-off between quality and efficiency, making it a compelling alternative for generating academic surveys. The paper also discusses the limitations of AutoSurvey, including the prevalence of overgeneralization errors in citations, which indicate that LLMs still rely heavily on their parametric knowledge for writing. The system's performance is influenced by the use of different LLMs as the base writer and varying iteration counts. However, AutoSurvey consistently performs well across various configurations, showcasing its robustness and efficiency. The paper concludes that AutoSurvey is the first system to explore the potential of large model agents in writing extensive academic surveys, providing a valuable reference for future related research.

AutoSurvey: Large Language Models Can Automatically Write Surveys

18 Jun 2024 | Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang