Understanding Planning Like Human%3A A Dual-process Framework for Dialogue Planning

The paper "Planning Like Human: A Dual-process Framework for Dialogue Planning" introduces a novel framework called Dual-Process Dialogue Planning (DPDP) to enhance the proactive dialogue capabilities of Large Language Models (LLMs). Inspired by the dual-process theory in psychology, which identifies two distinct modes of thinking—intuitive (fast) and analytical (slow)—DPDP integrates two complementary planning systems: a neural policy LM model for quick, instinctive responses to familiar contexts, and a Monte Carlo Tree Search (MCTS) mechanism for complex, novel scenarios. This dual strategy is further coupled with a two-stage training regimen: offline Reinforcement Learning for robust initial policy model formation followed by MCTS-enhanced on-the-fly learning. The empirical evaluations across diverse dialogue tasks demonstrate that DPDP outperforms existing methods in achieving both high-quality dialogues and operational efficiency. The framework's effectiveness is validated through comprehensive experiments on various proactive dialogue datasets, including ESCov, CIMA, and CraigslistBargain, showing superior performance in goal completion efficiency and success rates. The paper also discusses the trade-offs between the two planning systems, the impact of MCTS on efficiency, and the influence of the policy LM on MCTS planning. Finally, the authors highlight the potential for further improvements in dialogue planning towards more nuanced, human-like interactions.The paper "Planning Like Human: A Dual-process Framework for Dialogue Planning" introduces a novel framework called Dual-Process Dialogue Planning (DPDP) to enhance the proactive dialogue capabilities of Large Language Models (LLMs). Inspired by the dual-process theory in psychology, which identifies two distinct modes of thinking—intuitive (fast) and analytical (slow)—DPDP integrates two complementary planning systems: a neural policy LM model for quick, instinctive responses to familiar contexts, and a Monte Carlo Tree Search (MCTS) mechanism for complex, novel scenarios. This dual strategy is further coupled with a two-stage training regimen: offline Reinforcement Learning for robust initial policy model formation followed by MCTS-enhanced on-the-fly learning. The empirical evaluations across diverse dialogue tasks demonstrate that DPDP outperforms existing methods in achieving both high-quality dialogues and operational efficiency. The framework's effectiveness is validated through comprehensive experiments on various proactive dialogue datasets, including ESCov, CIMA, and CraigslistBargain, showing superior performance in goal completion efficiency and success rates. The paper also discusses the trade-offs between the two planning systems, the impact of MCTS on efficiency, and the influence of the policy LM on MCTS planning. Finally, the authors highlight the potential for further improvements in dialogue planning towards more nuanced, human-like interactions.

Planning Like Human: A Dual-process Framework for Dialogue Planning

8 Jun 2024 | Tao He1*, Lizi Liao2, Yixin Cao3, Yuanxing Liu1, Ming Liu1,4†Zerui Chen1, Bing Qin1,4