24 Apr 2017 | Tsung-Hsien Wen, David Vandyke, Nikola Mrkšić, Milica Gašić, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young
This paper introduces a neural network-based, end-to-end trainable goal-oriented dialogue system designed to assist users in completing specific tasks, such as finding a restaurant. The system is trained using a novel pipe-lined Wizard-of-Oz framework for collecting dialogue data, which allows for efficient and low-cost data collection. The model combines sequence-to-sequence learning with belief tracking to handle user inputs and generate appropriate responses, while also incorporating a database operator to query and retrieve relevant information. The system is evaluated on a corpus-based evaluation and human assessment, showing competitive performance in terms of BLEU score, entity matching rate, and task success rate. The experimental results demonstrate that the learned model can interact naturally with human subjects and complete tasks effectively. The paper also discusses the limitations and future work, including the need to handle noisy speech inputs and expand the domain of application.This paper introduces a neural network-based, end-to-end trainable goal-oriented dialogue system designed to assist users in completing specific tasks, such as finding a restaurant. The system is trained using a novel pipe-lined Wizard-of-Oz framework for collecting dialogue data, which allows for efficient and low-cost data collection. The model combines sequence-to-sequence learning with belief tracking to handle user inputs and generate appropriate responses, while also incorporating a database operator to query and retrieve relevant information. The system is evaluated on a corpus-based evaluation and human assessment, showing competitive performance in terms of BLEU score, entity matching rate, and task success rate. The experimental results demonstrate that the learned model can interact naturally with human subjects and complete tasks effectively. The paper also discusses the limitations and future work, including the need to handle noisy speech inputs and expand the domain of application.