Supporting Sensemaking of Large Language Model Outputs at Scale

Supporting Sensemaking of Large Language Model Outputs at Scale

2024 | KATY ILOKNA GERO, CHELSE SWOOPES, ZIWEI GU, JONATHAN K. KUMMERFELD, ELENA L. GLASSMAN
This paper presents an exploratory interface for supporting sensemaking of large language model (LLM) outputs at scale. The interface instantiates five combinations of text analysis and rendering techniques to help users scale up the amount of LLM outputs they can reason about, such as for ideation, model comparison, or output selection. The interface includes features for identifying exact matches, unique words, and a novel algorithm called Positional Diction Clustering (PDC), which highlights positionally consistent analogical text across LLM responses. These features support a wide variety of sensemaking tasks and even make tasks previously considered too difficult now tractable. The paper reports on a controlled user study (n=24) and eight case studies evaluating these features and how they support users in different tasks. The findings suggest that the interface features support a wide range of sensemaking tasks and that users can more easily identify patterns, differences, and similarities in LLM responses. The paper also presents design guidelines for future explorations of new LLM interfaces. The interface is designed to allow users to inspect LLM responses at the mesoscale, which is the scale between the single output inspection common in chat interfaces and the many thousands of outputs typically involved in annotation studies. The interface is based on theories of human cognition, including Variation Theory and Analogical Learning Theory, which suggest that exposure to variation and consistency within prescribed structures can help humans form more robust mental models of a phenomenon. The paper also discusses related work in skimming support, cross-document comparison, and sensemaking interfaces for generative AI. The findings suggest that the interface features are useful for a variety of tasks, including email (re)writing, model comparison, and identifying social bias. The paper concludes that the interface supports a wide range of sensemaking tasks and that the features are useful for both system designers and end-users. The interface is designed to allow users to inspect LLM responses at the mesoscale, which is the scale between the single output inspection common in chat interfaces and the many thousands of outputs typically involved in annotation studies. The interface is based on theories of human cognition, including Variation Theory and Analogical Learning Theory, which suggest that exposure to variation and consistency within prescribed structures can help humans form more robust mental models of a phenomenon. The paper also discusses related work in skimming support, cross-document comparison, and sensemaking interfaces for generative AI. The findings suggest that the interface features are useful for a variety of tasks, including email (re)writing, model comparison, and identifying social bias. The paper concludes that the interface supports a wide range of sensemaking tasks and that the features are useful for both system designers and end-users.This paper presents an exploratory interface for supporting sensemaking of large language model (LLM) outputs at scale. The interface instantiates five combinations of text analysis and rendering techniques to help users scale up the amount of LLM outputs they can reason about, such as for ideation, model comparison, or output selection. The interface includes features for identifying exact matches, unique words, and a novel algorithm called Positional Diction Clustering (PDC), which highlights positionally consistent analogical text across LLM responses. These features support a wide variety of sensemaking tasks and even make tasks previously considered too difficult now tractable. The paper reports on a controlled user study (n=24) and eight case studies evaluating these features and how they support users in different tasks. The findings suggest that the interface features support a wide range of sensemaking tasks and that users can more easily identify patterns, differences, and similarities in LLM responses. The paper also presents design guidelines for future explorations of new LLM interfaces. The interface is designed to allow users to inspect LLM responses at the mesoscale, which is the scale between the single output inspection common in chat interfaces and the many thousands of outputs typically involved in annotation studies. The interface is based on theories of human cognition, including Variation Theory and Analogical Learning Theory, which suggest that exposure to variation and consistency within prescribed structures can help humans form more robust mental models of a phenomenon. The paper also discusses related work in skimming support, cross-document comparison, and sensemaking interfaces for generative AI. The findings suggest that the interface features are useful for a variety of tasks, including email (re)writing, model comparison, and identifying social bias. The paper concludes that the interface supports a wide range of sensemaking tasks and that the features are useful for both system designers and end-users. The interface is designed to allow users to inspect LLM responses at the mesoscale, which is the scale between the single output inspection common in chat interfaces and the many thousands of outputs typically involved in annotation studies. The interface is based on theories of human cognition, including Variation Theory and Analogical Learning Theory, which suggest that exposure to variation and consistency within prescribed structures can help humans form more robust mental models of a phenomenon. The paper also discusses related work in skimming support, cross-document comparison, and sensemaking interfaces for generative AI. The findings suggest that the interface features are useful for a variety of tasks, including email (re)writing, model comparison, and identifying social bias. The paper concludes that the interface supports a wide range of sensemaking tasks and that the features are useful for both system designers and end-users.
Reach us at info@study.space
Understanding Supporting Sensemaking of Large Language Model Outputs at Scale