18 Jul 2024 | Yufei Tian, Tenghao Huang, Miri Liu, Derek Jiang, Alexander Spangher, Muhao Chen, Jonathan May, Nanyun Peng
This paper investigates the capability of large language models (LLMs) in storytelling, focusing on narrative development and plot progression. The authors introduce a novel computational framework to analyze narratives through three discourse-level aspects: story arcs, turning points, and affective dimensions (arousal and valence). By leveraging expert and automatic annotations, they uncover significant discrepancies between LLM-generated and human-written stories. Human stories are found to be more suspenseful, arousing, and diverse in narrative structures, while LLM stories are homogeneously positive and lack tension. The study also measures narrative reasoning skills as a precursor to generative capacities, concluding that most LLMs fall short of human abilities in discourse understanding. Finally, the authors demonstrate that explicit integration of discourse features can enhance storytelling, showing over 40% improvement in neural storytelling in terms of diversity, suspense, and arousal. The contributions of the paper include a unified framework for narrative analysis, a quantitative comparison of LLM and human generative capacities, and the demonstration that discourse-aware generation can significantly improve LLMs' storytelling abilities.This paper investigates the capability of large language models (LLMs) in storytelling, focusing on narrative development and plot progression. The authors introduce a novel computational framework to analyze narratives through three discourse-level aspects: story arcs, turning points, and affective dimensions (arousal and valence). By leveraging expert and automatic annotations, they uncover significant discrepancies between LLM-generated and human-written stories. Human stories are found to be more suspenseful, arousing, and diverse in narrative structures, while LLM stories are homogeneously positive and lack tension. The study also measures narrative reasoning skills as a precursor to generative capacities, concluding that most LLMs fall short of human abilities in discourse understanding. Finally, the authors demonstrate that explicit integration of discourse features can enhance storytelling, showing over 40% improvement in neural storytelling in terms of diversity, suspense, and arousal. The contributions of the paper include a unified framework for narrative analysis, a quantitative comparison of LLM and human generative capacities, and the demonstration that discourse-aware generation can significantly improve LLMs' storytelling abilities.