1 Jun 2024 | Shichao Sun, Ruifeng Yuan, Ziqiang Cao, Wenjie Li, Pengfei Liu
This paper compares two methods for iterative refinement in text summarization: Prompt Chaining and Stepwise Prompt. Large language models (LLMs) can improve summary quality by following a human-like iterative process of critique and refinement. The study evaluates these two approaches on the InstruSum dataset, which contains 100 article-requirement pairs for instruction-controllable text summarization. The results show that Prompt Chaining outperforms Stepwise Prompt in generating high-quality summaries. Prompt Chaining involves three separate prompts for drafting, critiquing, and refining, while Stepwise Prompt integrates these phases into a single prompt. Although Stepwise Prompt may simulate a refinement process by intentionally generating errors, Prompt Chaining consistently produces better results. The study also finds that Prompt Chaining is more stable and reliable across different evaluation models and human assessments. Additionally, the critique generated by Prompt Chaining is of higher quality, as evidenced by METACRITIQUE scores. The findings suggest that Prompt Chaining is more effective for text summarization and may be applicable to other NLP tasks. The study highlights the importance of iterative refinement in improving LLM performance and provides insights into the broader development of large language models.This paper compares two methods for iterative refinement in text summarization: Prompt Chaining and Stepwise Prompt. Large language models (LLMs) can improve summary quality by following a human-like iterative process of critique and refinement. The study evaluates these two approaches on the InstruSum dataset, which contains 100 article-requirement pairs for instruction-controllable text summarization. The results show that Prompt Chaining outperforms Stepwise Prompt in generating high-quality summaries. Prompt Chaining involves three separate prompts for drafting, critiquing, and refining, while Stepwise Prompt integrates these phases into a single prompt. Although Stepwise Prompt may simulate a refinement process by intentionally generating errors, Prompt Chaining consistently produces better results. The study also finds that Prompt Chaining is more stable and reliable across different evaluation models and human assessments. Additionally, the critique generated by Prompt Chaining is of higher quality, as evidenced by METACRITIQUE scores. The findings suggest that Prompt Chaining is more effective for text summarization and may be applicable to other NLP tasks. The study highlights the importance of iterative refinement in improving LLM performance and provides insights into the broader development of large language models.