8 Jun 2016 | Jiwei Li1*, Michel Galley2 Chris Brockett2 Georgios P. Spithourakis3*, Jianfeng Gao2 Bill Dolan2
This paper introduces two persona-based neural conversation models to address the issue of speaker consistency in neural response generation. The first model, the Speaker Model, encodes speaker characteristics such as background information and speaking style into distributed embeddings. The second model, the Speaker-Addressee Model, captures interaction patterns between two interlocutors. These models improve performance in perplexity and BLEU scores compared to baseline sequence-to-sequence models, with similar gains in speaker consistency as measured by human judges.
The Speaker Model integrates a speaker-level vector representation into the target part of the sequence-to-sequence model. The Speaker-Addressee Model encodes interaction patterns by combining individual embeddings of the speaker and addressee. These persona vectors are trained on human-human conversation data and used during testing to generate personalized responses. Experiments on Twitter conversations and TV series scripts show that leveraging persona vectors can improve BLEU scores by up to 20% and perplexity by 12%, with corresponding gains in consistency as judged by human annotators.
The paper also discusses related work, including previous approaches to conversational dialog generation and the use of neural language models. It presents the sequence-to-sequence model architecture, including the LSTM mechanism and decoding process. The Speaker Model and Speaker-Addressee Model are evaluated on various datasets, including the Twitter Persona Dataset and TV series transcripts. Results show that the Speaker Model significantly improves performance in terms of perplexity and BLEU scores compared to the standard sequence-to-sequence model. The Speaker-Addressee Model also shows improvements, though the gains are less pronounced.
Qualitative analysis of the models' outputs shows that they generate responses that are more consistent with the speaker's persona. Human evaluation of the models' outputs confirms that the persona-based models produce more consistent responses than the baseline model. The paper concludes that encoding personas in distributed representations allows the models to capture personal characteristics such as speaking style and background information, leading to more natural and consistent conversational responses. The models are shown to be effective in generating responses that accurately emulate an individual's persona in terms of linguistic behavior and other characteristics.This paper introduces two persona-based neural conversation models to address the issue of speaker consistency in neural response generation. The first model, the Speaker Model, encodes speaker characteristics such as background information and speaking style into distributed embeddings. The second model, the Speaker-Addressee Model, captures interaction patterns between two interlocutors. These models improve performance in perplexity and BLEU scores compared to baseline sequence-to-sequence models, with similar gains in speaker consistency as measured by human judges.
The Speaker Model integrates a speaker-level vector representation into the target part of the sequence-to-sequence model. The Speaker-Addressee Model encodes interaction patterns by combining individual embeddings of the speaker and addressee. These persona vectors are trained on human-human conversation data and used during testing to generate personalized responses. Experiments on Twitter conversations and TV series scripts show that leveraging persona vectors can improve BLEU scores by up to 20% and perplexity by 12%, with corresponding gains in consistency as judged by human annotators.
The paper also discusses related work, including previous approaches to conversational dialog generation and the use of neural language models. It presents the sequence-to-sequence model architecture, including the LSTM mechanism and decoding process. The Speaker Model and Speaker-Addressee Model are evaluated on various datasets, including the Twitter Persona Dataset and TV series transcripts. Results show that the Speaker Model significantly improves performance in terms of perplexity and BLEU scores compared to the standard sequence-to-sequence model. The Speaker-Addressee Model also shows improvements, though the gains are less pronounced.
Qualitative analysis of the models' outputs shows that they generate responses that are more consistent with the speaker's persona. Human evaluation of the models' outputs confirms that the persona-based models produce more consistent responses than the baseline model. The paper concludes that encoding personas in distributed representations allows the models to capture personal characteristics such as speaking style and background information, leading to more natural and consistent conversational responses. The models are shown to be effective in generating responses that accurately emulate an individual's persona in terms of linguistic behavior and other characteristics.