Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM

Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM

May 11–16, 2024 | Michelle S. Lam, Janice Teoh, James A. Landay, Jeffrey Heer, Michael S. Bernstein
The paper introduces LLooM, a concept induction algorithm that transforms unstructured text data into high-level, human-interpretable concepts. Unlike traditional topic modeling, which often produces low-level keywords, LLooM generates concepts defined by explicit inclusion criteria. The algorithm leverages large language models (LLMs) to iteratively synthesize sampled text and propose concepts of increasing generality. The LLooM Workbench, a mixed-initiative text analysis tool, visualizes datasets in terms of these high-level concepts, enabling analysts to shift their focus from interpreting topics to engaging in theory-driven analysis. The paper presents four analysis scenarios and technical evaluations demonstrating LLooM's effectiveness in uncovering nuanced insights from various datasets, including toxic online content, political social media feeds, academic paper abstracts, and anticipated consequences of AI research. LLooM outperforms state-of-the-art topic models in terms of concept quality and data coverage, and expert case studies show that it helps researchers uncover new insights even in familiar datasets.The paper introduces LLooM, a concept induction algorithm that transforms unstructured text data into high-level, human-interpretable concepts. Unlike traditional topic modeling, which often produces low-level keywords, LLooM generates concepts defined by explicit inclusion criteria. The algorithm leverages large language models (LLMs) to iteratively synthesize sampled text and propose concepts of increasing generality. The LLooM Workbench, a mixed-initiative text analysis tool, visualizes datasets in terms of these high-level concepts, enabling analysts to shift their focus from interpreting topics to engaging in theory-driven analysis. The paper presents four analysis scenarios and technical evaluations demonstrating LLooM's effectiveness in uncovering nuanced insights from various datasets, including toxic online content, political social media feeds, academic paper abstracts, and anticipated consequences of AI research. LLooM outperforms state-of-the-art topic models in terms of concept quality and data coverage, and expert case studies show that it helps researchers uncover new insights even in familiar datasets.
Reach us at info@study.space
[slides and audio] Concept Induction%3A Analyzing Unstructured Text with High-Level Concepts Using LLooM