Understanding A Network-based End-to-End Trainable Task-oriented Dialogue System

This paper introduces a neural network-based end-to-end trainable task-oriented dialogue system, along with a novel data collection method based on a pipeline Wizard-of-Oz framework. The system is designed to enable easy development of dialogue systems without making many assumptions about the task. It uses a sequence-to-sequence architecture augmented with dialogue history and database search outcomes. The system includes an intent network, belief trackers, a policy network, and a generation network. The intent network encodes user input into a distributed vector, while belief trackers maintain a probability distribution over slot-value pairs. The policy network combines the intent representation, belief state, and database truth value to generate a system action vector, which is then used to condition a response generation network. The system generates responses by substituting actual values from the database into a skeletal sentence structure. The system uses delexicalisation and weight tying to reduce training data requirements. A novel crowdsourcing-based Wizard-of-Oz data collection method is introduced to collect human-human dialogue corpora. This method allows for efficient and low-cost data collection. The system is tested on a restaurant search domain, where it successfully completes tasks by interacting with users. The model performs well across several metrics when trained on a small dataset. The system is evaluated using corpus-based metrics such as BLEU score, entity matching rate, and task success rate. It outperforms a handcrafted modular system in terms of task success and naturalness. The system is end-to-end trainable and can be extended to larger domains. The paper concludes that this is the first end-to-end neural network-based model for task-oriented dialogue systems. Future work includes scaling the model to larger domains and improving its ability to handle noisy speech inputs.This paper introduces a neural network-based end-to-end trainable task-oriented dialogue system, along with a novel data collection method based on a pipeline Wizard-of-Oz framework. The system is designed to enable easy development of dialogue systems without making many assumptions about the task. It uses a sequence-to-sequence architecture augmented with dialogue history and database search outcomes. The system includes an intent network, belief trackers, a policy network, and a generation network. The intent network encodes user input into a distributed vector, while belief trackers maintain a probability distribution over slot-value pairs. The policy network combines the intent representation, belief state, and database truth value to generate a system action vector, which is then used to condition a response generation network. The system generates responses by substituting actual values from the database into a skeletal sentence structure. The system uses delexicalisation and weight tying to reduce training data requirements. A novel crowdsourcing-based Wizard-of-Oz data collection method is introduced to collect human-human dialogue corpora. This method allows for efficient and low-cost data collection. The system is tested on a restaurant search domain, where it successfully completes tasks by interacting with users. The model performs well across several metrics when trained on a small dataset. The system is evaluated using corpus-based metrics such as BLEU score, entity matching rate, and task success rate. It outperforms a handcrafted modular system in terms of task success and naturalness. The system is end-to-end trainable and can be extended to larger domains. The paper concludes that this is the first end-to-end neural network-based model for task-oriented dialogue systems. Future work includes scaling the model to larger domains and improving its ability to handle noisy speech inputs.

A Network-based End-to-End Trainable Task-oriented Dialogue System

24 Apr 2017 | Tsung-Hsien Wen, David Vandyke, Nikola Mrkšić, Milica Gašić, Lina M. Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young