BASES: Large-scale Web Search User Simulation with Large Language Model based Agents

BASES: Large-scale Web Search User Simulation with Large Language Model based Agents

27 Feb 2024 | Ruiyang Ren, Peng Qiu, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, Haifeng Wang
This paper introduces BASES, a novel framework for large-scale web search user simulation using large language model (LLM)-based agents. The framework is designed to generate diverse and realistic user profiles, enabling the simulation of a wide range of web search behaviors. BASES addresses the challenges of simulating web search user behaviors by employing a synergistic synthesis method to construct user profiles, ensuring both efficiency and diversity in profile generation. The framework also incorporates query and click behavior prompting strategies to generate accurate and personalized user behaviors. To validate the effectiveness of BASES, the authors conduct experiments on two human benchmarks in both Chinese and English, demonstrating that BASES can effectively simulate large-scale human-like search behaviors. Additionally, the authors develop WARRIORS, a new large-scale dataset encompassing web search user behaviors in both Chinese and English, which can significantly enhance research in the field of information retrieval. The BASES framework is evaluated on two classical information retrieval (IR) tasks: session search and click prediction. The results show that BASES-based models achieve significant improvements in performance compared to other behavior datasets trained models, particularly in low-resource scenarios. The WARRIORS dataset is also evaluated on these tasks, demonstrating its effectiveness in enhancing IR models. The authors also discuss the potential of BASES in low-resource scenarios, where the framework can generate simulated user behavior data to improve the performance of IR models. The WARRIORS dataset is constructed using the BASES framework and includes a large collection of simulated web search user behavior data. The study concludes that BASES provides a promising approach for simulating large-scale web search user behaviors, contributing to the advancement of search technologies and user experience optimization. The authors also highlight the ethical considerations of using real user data and emphasize the importance of simulating user behaviors to respect user privacy and contribute to the development of information retrieval technologies.This paper introduces BASES, a novel framework for large-scale web search user simulation using large language model (LLM)-based agents. The framework is designed to generate diverse and realistic user profiles, enabling the simulation of a wide range of web search behaviors. BASES addresses the challenges of simulating web search user behaviors by employing a synergistic synthesis method to construct user profiles, ensuring both efficiency and diversity in profile generation. The framework also incorporates query and click behavior prompting strategies to generate accurate and personalized user behaviors. To validate the effectiveness of BASES, the authors conduct experiments on two human benchmarks in both Chinese and English, demonstrating that BASES can effectively simulate large-scale human-like search behaviors. Additionally, the authors develop WARRIORS, a new large-scale dataset encompassing web search user behaviors in both Chinese and English, which can significantly enhance research in the field of information retrieval. The BASES framework is evaluated on two classical information retrieval (IR) tasks: session search and click prediction. The results show that BASES-based models achieve significant improvements in performance compared to other behavior datasets trained models, particularly in low-resource scenarios. The WARRIORS dataset is also evaluated on these tasks, demonstrating its effectiveness in enhancing IR models. The authors also discuss the potential of BASES in low-resource scenarios, where the framework can generate simulated user behavior data to improve the performance of IR models. The WARRIORS dataset is constructed using the BASES framework and includes a large collection of simulated web search user behavior data. The study concludes that BASES provides a promising approach for simulating large-scale web search user behaviors, contributing to the advancement of search technologies and user experience optimization. The authors also highlight the ethical considerations of using real user data and emphasize the importance of simulating user behaviors to respect user privacy and contribute to the development of information retrieval technologies.
Reach us at info@study.space