Hypothesis Generation with Large Language Models

Hypothesis Generation with Large Language Models

23 Aug 2024 | Yangqiaoyu Zhou, Haokun Liu, Tejes Srivastava, Hongyuan Mei & Chenhao Tan
This paper presents HypoGeniC, a novel algorithm for generating hypotheses using large language models (LLMs). The algorithm is designed to generate high-quality hypotheses based on labeled examples, enabling better predictive performance in classification tasks. HypoGeniC iteratively updates hypotheses to improve their quality, using a reward function inspired by multi-armed bandits to balance exploration and exploitation. The algorithm outperforms few-shot prompting and supervised learning on multiple datasets, including synthetic and real-world tasks. The generated hypotheses are not only accurate but also interpretable and generalize well across different LLMs and out-of-distribution datasets. The hypotheses also corroborate existing theories and provide new insights into the tasks. The paper evaluates the performance of HypoGeniC on four tasks: SHOE SALES, DECEPTIVE REVIEWS, HEADLINE POPULARITY, and TWEET POPULARITY. The results show that HypoGeniC achieves significant improvements in accuracy compared to baselines and even oracle models. The generated hypotheses are also robust across different LLMs and out-of-distribution datasets. The paper also discusses the limitations of the approach, including the potential risks of hypothesis generation and the cost of the method. The authors conclude that HypoGeniC is a promising approach for generating hypotheses in real-world tasks and encourage further research in this area.This paper presents HypoGeniC, a novel algorithm for generating hypotheses using large language models (LLMs). The algorithm is designed to generate high-quality hypotheses based on labeled examples, enabling better predictive performance in classification tasks. HypoGeniC iteratively updates hypotheses to improve their quality, using a reward function inspired by multi-armed bandits to balance exploration and exploitation. The algorithm outperforms few-shot prompting and supervised learning on multiple datasets, including synthetic and real-world tasks. The generated hypotheses are not only accurate but also interpretable and generalize well across different LLMs and out-of-distribution datasets. The hypotheses also corroborate existing theories and provide new insights into the tasks. The paper evaluates the performance of HypoGeniC on four tasks: SHOE SALES, DECEPTIVE REVIEWS, HEADLINE POPULARITY, and TWEET POPULARITY. The results show that HypoGeniC achieves significant improvements in accuracy compared to baselines and even oracle models. The generated hypotheses are also robust across different LLMs and out-of-distribution datasets. The paper also discusses the limitations of the approach, including the potential risks of hypothesis generation and the cost of the method. The authors conclude that HypoGeniC is a promising approach for generating hypotheses in real-world tasks and encourage further research in this area.
Reach us at info@study.space
[slides] Hypothesis Generation with Large Language Models | StudySpace