Planning Like Human: A Dual-process Framework for Dialogue Planning

Planning Like Human: A Dual-process Framework for Dialogue Planning

8 Jun 2024 | Tao He, Lizi Liao, Yixin Cao, Yuanxing Liu, Ming Liu, Zerui Chen, Bing Qin
The paper introduces a dual-process dialogue planning framework (DPDP) that integrates intuitive and analytical planning systems to enhance dialogue planning in large language models (LLMs). Inspired by dual-process theory in psychology, which identifies two modes of thinking—intuitive (fast) and analytical (slow)—DPDP employs an intuitive policy model for familiar contexts and a deliberative Monte Carlo Tree Search (MCTS) mechanism for complex, novel scenarios. The framework combines a two-stage training approach: offline reinforcement learning for initial policy model formation followed by MCTS-enhanced on-the-fly learning, ensuring a balance between efficiency and strategic depth. Empirical evaluations across diverse dialogue tasks demonstrate DPDP's superiority in achieving high-quality dialogues and operational efficiency, outperforming existing methods. The framework dynamically switches between systems based on policy model uncertainty, optimizing for both efficiency and strategic depth. Key contributions include a dual-system approach to dialogue planning, a novel two-stage training method for the policy model, and experimental results validating the framework's effectiveness. The study highlights the importance of integrating intuitive and analytical planning to achieve more effective and efficient dialogue systems.The paper introduces a dual-process dialogue planning framework (DPDP) that integrates intuitive and analytical planning systems to enhance dialogue planning in large language models (LLMs). Inspired by dual-process theory in psychology, which identifies two modes of thinking—intuitive (fast) and analytical (slow)—DPDP employs an intuitive policy model for familiar contexts and a deliberative Monte Carlo Tree Search (MCTS) mechanism for complex, novel scenarios. The framework combines a two-stage training approach: offline reinforcement learning for initial policy model formation followed by MCTS-enhanced on-the-fly learning, ensuring a balance between efficiency and strategic depth. Empirical evaluations across diverse dialogue tasks demonstrate DPDP's superiority in achieving high-quality dialogues and operational efficiency, outperforming existing methods. The framework dynamically switches between systems based on policy model uncertainty, optimizing for both efficiency and strategic depth. Key contributions include a dual-system approach to dialogue planning, a novel two-stage training method for the policy model, and experimental results validating the framework's effectiveness. The study highlights the importance of integrating intuitive and analytical planning to achieve more effective and efficient dialogue systems.
Reach us at info@study.space
[slides] Planning Like Human%3A A Dual-process Framework for Dialogue Planning | StudySpace