[slides and audio] Virtual Personas for Language Models via an Anthology of Backstories

This paper introduces "Anthology," a method for conditioning large language models (LLMs) to specific virtual personas using open-ended life narratives, or "backstories." The authors argue that LLMs, while trained on vast repositories of diverse human-authored text, often fail to accurately represent individual human users, especially in behavioral studies. Anthology addresses this issue by generating a diverse set of backstories, which are then used to condition LLMs to particular personas. The method involves four stages: generating backstories, performing demographic surveys on these backstories, selecting a representative set of virtual personas, and matching these personas to target human populations. The authors demonstrate that Anthology significantly improves the accuracy and consistency of LLM responses compared to baseline methods, achieving up to 18% improvement in Wasserstein Distance and 27% improvement in consistency metrics. The paper also includes an open-source anthology of approximately 10,000 backstories and provides detailed experimental results from three nationally representative Pew Research Center surveys. The authors discuss the limitations and ethical considerations of their approach, emphasizing the need for further research to refine and expand its applications.This paper introduces "Anthology," a method for conditioning large language models (LLMs) to specific virtual personas using open-ended life narratives, or "backstories." The authors argue that LLMs, while trained on vast repositories of diverse human-authored text, often fail to accurately represent individual human users, especially in behavioral studies. Anthology addresses this issue by generating a diverse set of backstories, which are then used to condition LLMs to particular personas. The method involves four stages: generating backstories, performing demographic surveys on these backstories, selecting a representative set of virtual personas, and matching these personas to target human populations. The authors demonstrate that Anthology significantly improves the accuracy and consistency of LLM responses compared to baseline methods, achieving up to 18% improvement in Wasserstein Distance and 27% improvement in consistency metrics. The paper also includes an open-source anthology of approximately 10,000 backstories and provides detailed experimental results from three nationally representative Pew Research Center surveys. The authors discuss the limitations and ethical considerations of their approach, emphasizing the need for further research to refine and expand its applications.

Virtual Personas for Language Models via an Anthology of Backstories

9 Jul 2024 | Suhong Moon*, Marwa Abdulhai*, Minwoo Kang*, Joseph Suh*, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan

9 Jul 2024 | Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Kohen Behar, David M. Chan