27 Feb 2024 | Ruiyang Ren, Peng Qiu, Yingqi Qu, Jing Liu, Wayne Xin Zhao, Hua Wu, Ji-Rong Wen, Haifeng Wang
The paper introduces BASES, a novel framework for simulating large-scale web search user behaviors using LLM-based agents. BASES addresses the challenges of real user data scarcity and privacy concerns by generating diverse and personalized user profiles. The framework employs synergistic synthesis to predefine attribute values, ensuring both efficiency and diversity in profile generation. LLM-based agents, equipped with these profiles, simulate user behaviors through query and click behavior prompting strategies. Extensive experiments demonstrate that BASES effectively simulates human-like search behaviors, achieving high anthropomorphism in query and click behaviors. The effectiveness of BASES is further validated through evaluation on Chinese and English benchmarks, showing significant improvements in NDCG@1 compared to other datasets. Additionally, BASES shows potential in low-resource scenarios, improving performance with augmented data. To support future research, the authors develop WARRIORS, a large-scale dataset of web search user behaviors, encompassing both Chinese and English versions. The dataset is designed to address limitations of existing datasets and provide realistic user interaction behaviors. The paper concludes by highlighting the contributions of BASES and WARRIORS to the field of information retrieval and user behavior simulation.The paper introduces BASES, a novel framework for simulating large-scale web search user behaviors using LLM-based agents. BASES addresses the challenges of real user data scarcity and privacy concerns by generating diverse and personalized user profiles. The framework employs synergistic synthesis to predefine attribute values, ensuring both efficiency and diversity in profile generation. LLM-based agents, equipped with these profiles, simulate user behaviors through query and click behavior prompting strategies. Extensive experiments demonstrate that BASES effectively simulates human-like search behaviors, achieving high anthropomorphism in query and click behaviors. The effectiveness of BASES is further validated through evaluation on Chinese and English benchmarks, showing significant improvements in NDCG@1 compared to other datasets. Additionally, BASES shows potential in low-resource scenarios, improving performance with augmented data. To support future research, the authors develop WARRIORS, a large-scale dataset of web search user behaviors, encompassing both Chinese and English versions. The dataset is designed to address limitations of existing datasets and provide realistic user interaction behaviors. The paper concludes by highlighting the contributions of BASES and WARRIORS to the field of information retrieval and user behavior simulation.