15 Oct 2024 | Ishita Kumar, Snigdha Viswanathan, Sushrita Yerra, Alireza Salemi, Ryan A. Rossi, Franck Dernoncourt, Hanieh Deilamsalehy, Xiang Chen, Ruiyi Zhang, Shubham Agarwal, Nedim Lipka, Chien Van Nguyen, Thien Huu Nguyen, Hamed Zamani
The paper introduces the LongLaMP benchmark, a comprehensive and diverse evaluation framework for personalized long-text generation. Personalized long-text generation is crucial for applications such as email, review, and content creation, where coherent and contextually relevant long texts are required. The benchmark consists of four tasks: Personalized Email Completion, Personalized Abstract Generation, Personalized Review Writing, and Personalized Topic Writing. Each task is designed to evaluate different aspects of personalization, including audience, purpose, writing style, content type, credibility, and structural elements.
The LongLaMP benchmark uses a retrieval-augmented generation (RAG) framework, which leverages a retrieval model to retrieve relevant user data and integrates it into the input prompts of the language model. This approach enhances the personalization of the generated text while maintaining computational efficiency. The benchmark is evaluated using ROUGE-1, ROUGE-L, and METEOR metrics, and the results demonstrate significant improvements in personalized long-text generation, with an average improvement of 30.21\% in ROUGE-1 and 47.5\% in ROUGE-L across all tasks.
The paper also discusses the impact of varying the number of retrieved profiles and different retrieval methods on the generated output. The experiments show that the proposed framework consistently improves performance across all benchmarks, with optimal results achieved by retrieving a moderate number of profiles. The LongLaMP benchmark and findings provide a valuable resource for further research in personalized long-text generation, enhancing user experiences and tailoring language generation to specific individuals.The paper introduces the LongLaMP benchmark, a comprehensive and diverse evaluation framework for personalized long-text generation. Personalized long-text generation is crucial for applications such as email, review, and content creation, where coherent and contextually relevant long texts are required. The benchmark consists of four tasks: Personalized Email Completion, Personalized Abstract Generation, Personalized Review Writing, and Personalized Topic Writing. Each task is designed to evaluate different aspects of personalization, including audience, purpose, writing style, content type, credibility, and structural elements.
The LongLaMP benchmark uses a retrieval-augmented generation (RAG) framework, which leverages a retrieval model to retrieve relevant user data and integrates it into the input prompts of the language model. This approach enhances the personalization of the generated text while maintaining computational efficiency. The benchmark is evaluated using ROUGE-1, ROUGE-L, and METEOR metrics, and the results demonstrate significant improvements in personalized long-text generation, with an average improvement of 30.21\% in ROUGE-1 and 47.5\% in ROUGE-L across all tasks.
The paper also discusses the impact of varying the number of retrieved profiles and different retrieval methods on the generated output. The experiments show that the proposed framework consistently improves performance across all benchmarks, with optimal results achieved by retrieving a moderate number of profiles. The LongLaMP benchmark and findings provide a valuable resource for further research in personalized long-text generation, enhancing user experiences and tailoring language generation to specific individuals.