DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset

DailyDialog: A Manually Labelled Multi-turn Dialogue Dataset

11 Oct 2017 | Yanran Li, Hui Su, Xiaoyu Shen, Wenjie Li, Ziqiang Cao, Shuzi Niu
This paper introduces DailyDialog, a high-quality, manually labeled multi-turn dialogue dataset that captures natural daily conversations. The dataset contains 13,118 multi-turn dialogues, with an average of 8 turns per dialogue, and is manually annotated with communication intention and emotion information. The dialogues are human-written, less noisy, and reflect real-life communication patterns, covering a wide range of topics such as relationships, daily life, and work. The dataset is annotated with four intention classes (Inform, Questions, Directives, Commissive) and seven emotion classes based on the BigSix Theory. The dataset is designed to be suitable for training compact conversational models and is evaluated against existing approaches in both retrieval-based and generation-based methods. The dataset is constructed by crawling data from websites used for English language practice, ensuring that the dialogues are natural and reflect real-life communication. The dialogues are manually annotated by three experts, achieving an inter-annotator agreement of 78.9%. The dataset is rich in emotion and contains over 3,675 dialogues ending with positive emotions, making it suitable for research on conversational agents that can regulate conversations towards a happy ending. The paper evaluates existing approaches on the DailyDialog dataset, including retrieval-based methods such as embedding-based similarity, feature-based similarity, and reranking-enhanced approaches, as well as generation-based methods such as Seq2Seq, attention-based Seq2Seq, and hierarchical encoder-decoder (HRED). The results show that attention-based approaches and HRED achieve higher BLEU scores, indicating better performance in generating coherent and meaningful responses. However, pre-training on datasets like OpenSubtitle, which are domain-specific, leads to lower BLEU scores, suggesting that domain adaptation is important for effective dialogue generation. The paper concludes that DailyDialog is a valuable resource for research in dialog systems, offering a realistic and diverse set of dialogues that can be used to train and evaluate conversational agents. The dataset is available online and is expected to benefit future research in this field.This paper introduces DailyDialog, a high-quality, manually labeled multi-turn dialogue dataset that captures natural daily conversations. The dataset contains 13,118 multi-turn dialogues, with an average of 8 turns per dialogue, and is manually annotated with communication intention and emotion information. The dialogues are human-written, less noisy, and reflect real-life communication patterns, covering a wide range of topics such as relationships, daily life, and work. The dataset is annotated with four intention classes (Inform, Questions, Directives, Commissive) and seven emotion classes based on the BigSix Theory. The dataset is designed to be suitable for training compact conversational models and is evaluated against existing approaches in both retrieval-based and generation-based methods. The dataset is constructed by crawling data from websites used for English language practice, ensuring that the dialogues are natural and reflect real-life communication. The dialogues are manually annotated by three experts, achieving an inter-annotator agreement of 78.9%. The dataset is rich in emotion and contains over 3,675 dialogues ending with positive emotions, making it suitable for research on conversational agents that can regulate conversations towards a happy ending. The paper evaluates existing approaches on the DailyDialog dataset, including retrieval-based methods such as embedding-based similarity, feature-based similarity, and reranking-enhanced approaches, as well as generation-based methods such as Seq2Seq, attention-based Seq2Seq, and hierarchical encoder-decoder (HRED). The results show that attention-based approaches and HRED achieve higher BLEU scores, indicating better performance in generating coherent and meaningful responses. However, pre-training on datasets like OpenSubtitle, which are domain-specific, leads to lower BLEU scores, suggesting that domain adaptation is important for effective dialogue generation. The paper concludes that DailyDialog is a valuable resource for research in dialog systems, offering a realistic and diverse set of dialogues that can be used to train and evaluate conversational agents. The dataset is available online and is expected to benefit future research in this field.
Reach us at info@study.space
[slides and audio] DailyDialog%3A A Manually Labelled Multi-turn Dialogue Dataset