Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

Towards Empathetic Open-domain Conversation Models: a New Benchmark and Dataset

July 28 - August 2, 2019 | Hannah Rashkin, Eric Michael Smith, Margaret Li, Y-Lan Boureau
This paper introduces EMPATHETICDIALOGUES, a new benchmark and dataset of 25,000 conversations grounded in emotional situations, designed to evaluate and improve empathetic dialogue generation. The dataset is created through crowdsourced one-on-one conversations, covering a wide range of emotions in a balanced way. Each dialogue is based on a specific situation where a speaker is feeling a given emotion, and a listener responds. The dataset is larger and contains a more extensive set of emotions than many similar emotion prediction datasets. The paper investigates how dialogue models can be adapted to generate more empathetic responses by leveraging the EMPATHETICDIALOGUES dataset. Experiments show that dialogue models trained on this dataset are perceived as more empathetic by human evaluators compared to models trained on large-scale internet conversation data. The study also explores different ways to combine information from related tasks to improve empathetic responses. The paper proposes two simple methods to leverage the dataset: using utterances from the training data as candidate responses in a retrieval model at inference time, and fine-tuning the model on the task. It also explores the use of external predictors to incorporate supervised information into the model, which can improve performance on empathy and relevance metrics. The paper evaluates the models on their ability to reproduce the listener's portion of the conversation, using both automated metrics and human evaluation. Human evaluation is important as automated metrics do not always correlate with human judgments of dialogue quality. The results show that models trained on the EMPATHETICDIALOGUES dataset perform better on empathy metrics compared to models trained on other datasets. The study also finds that fine-tuning on the dataset improves performance on automated metrics, although it may come at the expense of performance on other corpora. The paper concludes that the EMPATHETICDIALOGUES dataset can be used to improve the empathy of dialogue systems, and that future work should focus on integrating empathetic responding into more general dialogue systems.This paper introduces EMPATHETICDIALOGUES, a new benchmark and dataset of 25,000 conversations grounded in emotional situations, designed to evaluate and improve empathetic dialogue generation. The dataset is created through crowdsourced one-on-one conversations, covering a wide range of emotions in a balanced way. Each dialogue is based on a specific situation where a speaker is feeling a given emotion, and a listener responds. The dataset is larger and contains a more extensive set of emotions than many similar emotion prediction datasets. The paper investigates how dialogue models can be adapted to generate more empathetic responses by leveraging the EMPATHETICDIALOGUES dataset. Experiments show that dialogue models trained on this dataset are perceived as more empathetic by human evaluators compared to models trained on large-scale internet conversation data. The study also explores different ways to combine information from related tasks to improve empathetic responses. The paper proposes two simple methods to leverage the dataset: using utterances from the training data as candidate responses in a retrieval model at inference time, and fine-tuning the model on the task. It also explores the use of external predictors to incorporate supervised information into the model, which can improve performance on empathy and relevance metrics. The paper evaluates the models on their ability to reproduce the listener's portion of the conversation, using both automated metrics and human evaluation. Human evaluation is important as automated metrics do not always correlate with human judgments of dialogue quality. The results show that models trained on the EMPATHETICDIALOGUES dataset perform better on empathy metrics compared to models trained on other datasets. The study also finds that fine-tuning on the dataset improves performance on automated metrics, although it may come at the expense of performance on other corpora. The paper concludes that the EMPATHETICDIALOGUES dataset can be used to improve the empathy of dialogue systems, and that future work should focus on integrating empathetic responding into more general dialogue systems.
Reach us at info@study.space
[slides and audio] Towards Empathetic Open-domain Conversation Models%3A A New Benchmark and Dataset