Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

8 Apr 2024 | Yijia Shao Yucheng Jiang Theodore A. Kanell Peter Xu Omar Khattab Monica S. Lam
The paper "Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models" by Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, and Monica S. Lam from Stanford University explores the application of large language models (LLMs) to generate long-form articles from scratch, similar to Wikipedia pages. The pre-writing stage, which involves research and outlining, is a critical yet underexplored aspect of this task. To address this, the authors propose STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking), a system that automates the pre-writing stage by: 1. **Discovering Diverse Perspectives**: Researching the given topic to identify various perspectives. 2. **Simulating Conversations**: Simulating conversations where writers with different perspectives ask questions to an expert grounded on trusted Internet sources. 3. **Curating the Outline**: Creating an outline based on the collected information. The evaluation is conducted using FreshWiki, a dataset of recent high-quality Wikipedia articles, and outline assessments. The results show that STORM-generated articles are more organized and have broader coverage compared to those generated by an outline-driven retrieval-augmented baseline. Expert feedback also highlights new challenges, such as source bias transfer and over-association of unrelated facts, which need to be addressed in future research. The paper concludes with a discussion of the limitations and ethical considerations, emphasizing the need for further improvements in neutrality and balance in generated articles.The paper "Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models" by Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, and Monica S. Lam from Stanford University explores the application of large language models (LLMs) to generate long-form articles from scratch, similar to Wikipedia pages. The pre-writing stage, which involves research and outlining, is a critical yet underexplored aspect of this task. To address this, the authors propose STORM (Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking), a system that automates the pre-writing stage by: 1. **Discovering Diverse Perspectives**: Researching the given topic to identify various perspectives. 2. **Simulating Conversations**: Simulating conversations where writers with different perspectives ask questions to an expert grounded on trusted Internet sources. 3. **Curating the Outline**: Creating an outline based on the collected information. The evaluation is conducted using FreshWiki, a dataset of recent high-quality Wikipedia articles, and outline assessments. The results show that STORM-generated articles are more organized and have broader coverage compared to those generated by an outline-driven retrieval-augmented baseline. Expert feedback also highlights new challenges, such as source bias transfer and over-association of unrelated facts, which need to be addressed in future research. The paper concludes with a discussion of the limitations and ethical considerations, emphasizing the need for further improvements in neutrality and balance in generated articles.
Reach us at info@study.space
[slides and audio] Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models