July 14–18, 2024 | Hideaki Joko, Shubham Chatterjee, Andrew Ramsay, Arjen P. de Vries, Jeff Dalton, Faegheh Hasibi
This paper introduces LAPS, an LLM-Augmented Personalized Self-Dialogue method for collecting large-scale, multi-session, and multi-domain conversations with actual user preferences. LAPS uses large language models (LLMs) to guide human workers in generating personalized dialogues, enabling the collection of diverse and high-quality conversations. The method involves four key components: dialogue act classification, guidance generation, utterance composition, and preference extraction. Dialogue act classification determines the next action for the assistant, while guidance generation provides personalized instructions for the human worker. Utterance composition involves the human worker generating responses as both user and assistant, and preference extraction identifies and stores user preferences in a preference memory. The collected dataset includes 1,406 multi-domain, multi-session dialogues paired with 11,215 preferences. LAPS-generated conversations are compared to existing datasets and show higher diversity and quality. The preference memory enhances the effective utilization of user preferences in recommendations, leading to more accurate and explainable recommendations. The results demonstrate that LAPS is a scalable and effective method for collecting personalized conversational data.This paper introduces LAPS, an LLM-Augmented Personalized Self-Dialogue method for collecting large-scale, multi-session, and multi-domain conversations with actual user preferences. LAPS uses large language models (LLMs) to guide human workers in generating personalized dialogues, enabling the collection of diverse and high-quality conversations. The method involves four key components: dialogue act classification, guidance generation, utterance composition, and preference extraction. Dialogue act classification determines the next action for the assistant, while guidance generation provides personalized instructions for the human worker. Utterance composition involves the human worker generating responses as both user and assistant, and preference extraction identifies and stores user preferences in a preference memory. The collected dataset includes 1,406 multi-domain, multi-session dialogues paired with 11,215 preferences. LAPS-generated conversations are compared to existing datasets and show higher diversity and quality. The preference memory enhances the effective utilization of user preferences in recommendations, leading to more accurate and explainable recommendations. The results demonstrate that LAPS is a scalable and effective method for collecting personalized conversational data.