8 Apr 2024 | Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, Monica S. Lam
This paper presents STORM, a system for generating Wikipedia-like articles from scratch by automating the pre-writing stage. STORM addresses the challenge of creating well-organized, comprehensive long-form articles by simulating conversations between a writer and a topic expert, using diverse perspectives to ask insightful questions and curate information. The system first identifies multiple perspectives on a topic by analyzing related Wikipedia articles, then simulates multi-turn conversations to generate questions and answers grounded in trusted sources. These insights are used to create an outline, which is then expanded into a full-length article.
To evaluate STORM, the authors curate the FreshWiki dataset, consisting of recent, high-quality Wikipedia articles, and develop metrics to assess outline and article quality. They also gather feedback from experienced Wikipedia editors, who find that STORM outperforms a baseline method in terms of organization and breadth. However, challenges remain, such as source bias and over-association of unrelated facts.
The system is evaluated using automatic metrics like heading soft recall and entity recall, as well as human evaluations. Results show that STORM produces more organized and comprehensive articles than baseline methods. Human evaluators also note that STORM-generated articles are more informative and have broader coverage than those generated by other methods. However, issues with verifiability and neutrality remain, highlighting the need for further improvements in grounded writing systems.
STORM's approach demonstrates the potential of large language models in generating long-form, well-structured articles. The system's ability to simulate conversations and ask insightful questions helps in creating more comprehensive outlines, which are then expanded into full-length articles. The results indicate that STORM is effective in generating articles that are more organized and comprehensive than those generated by other methods, although challenges remain in ensuring neutrality and verifiability.This paper presents STORM, a system for generating Wikipedia-like articles from scratch by automating the pre-writing stage. STORM addresses the challenge of creating well-organized, comprehensive long-form articles by simulating conversations between a writer and a topic expert, using diverse perspectives to ask insightful questions and curate information. The system first identifies multiple perspectives on a topic by analyzing related Wikipedia articles, then simulates multi-turn conversations to generate questions and answers grounded in trusted sources. These insights are used to create an outline, which is then expanded into a full-length article.
To evaluate STORM, the authors curate the FreshWiki dataset, consisting of recent, high-quality Wikipedia articles, and develop metrics to assess outline and article quality. They also gather feedback from experienced Wikipedia editors, who find that STORM outperforms a baseline method in terms of organization and breadth. However, challenges remain, such as source bias and over-association of unrelated facts.
The system is evaluated using automatic metrics like heading soft recall and entity recall, as well as human evaluations. Results show that STORM produces more organized and comprehensive articles than baseline methods. Human evaluators also note that STORM-generated articles are more informative and have broader coverage than those generated by other methods. However, issues with verifiability and neutrality remain, highlighting the need for further improvements in grounded writing systems.
STORM's approach demonstrates the potential of large language models in generating long-form, well-structured articles. The system's ability to simulate conversations and ask insightful questions helps in creating more comprehensive outlines, which are then expanded into full-length articles. The results indicate that STORM is effective in generating articles that are more organized and comprehensive than those generated by other methods, although challenges remain in ensuring neutrality and verifiability.