MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

MultiWOZ - A Large-Scale Multi-Domain Wizard-of-Oz Dataset for Task-Oriented Dialogue Modelling

20 Apr 2020 | Paweł Budzianowski, Tsung-Hsien Wen, Bo-Hsiang Tseng, Iñigo Casanueva, Stefan Ultes, Osman Ramadan and Milica Gašić
The MultiWOZ dataset is a large-scale, multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. It consists of 10,438 dialogues, each annotated with dialogue states and actions, making it significantly larger than previous task-oriented dialogue corpora. The dataset was collected using a crowd-sourcing approach, without the need for professional annotators, and includes a wide range of domains such as hotel booking, restaurant reservations, and public transport. The data collection process involved creating task templates that span multiple domains and were used to generate natural, human-like conversations. The dataset is annotated with dialogue acts, which are essential for dialogue systems to understand and generate appropriate responses. The MultiWOZ dataset provides a benchmark for dialogue state tracking, dialogue act generation, and end-to-end response generation. It is designed to be a valuable resource for researchers and developers working on task-oriented dialogue systems, offering a diverse and rich set of data for training and evaluating dialogue models. The dataset is freely available online and includes detailed annotations and statistics, making it a comprehensive resource for the dialogue modelling community.The MultiWOZ dataset is a large-scale, multi-domain Wizard-of-Oz dataset for task-oriented dialogue modelling. It consists of 10,438 dialogues, each annotated with dialogue states and actions, making it significantly larger than previous task-oriented dialogue corpora. The dataset was collected using a crowd-sourcing approach, without the need for professional annotators, and includes a wide range of domains such as hotel booking, restaurant reservations, and public transport. The data collection process involved creating task templates that span multiple domains and were used to generate natural, human-like conversations. The dataset is annotated with dialogue acts, which are essential for dialogue systems to understand and generate appropriate responses. The MultiWOZ dataset provides a benchmark for dialogue state tracking, dialogue act generation, and end-to-end response generation. It is designed to be a valuable resource for researchers and developers working on task-oriented dialogue systems, offering a diverse and rich set of data for training and evaluating dialogue models. The dataset is freely available online and includes detailed annotations and statistics, making it a comprehensive resource for the dialogue modelling community.
Reach us at info@study.space