LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

LLM-SR: Scientific Equation Discovery via Programming with Large Language Models

2025 | Parshin Shojaei, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan K. Reddy
LLM-SR is a novel approach for discovering scientific equations using Large Language Models (LLMs). It leverages the scientific knowledge and code generation capabilities of LLMs to find equations from data by treating equations as programs and combining LLM-generated hypotheses with evolutionary search. The method iteratively proposes equation skeletons, optimizes them against data, and refines them using a dynamic experience buffer. LLM-SR outperforms state-of-the-art symbolic regression methods, particularly in out-of-domain test settings, by efficiently exploring the equation search space and incorporating scientific priors. The framework was evaluated on four benchmark problems across physics, biology, and materials science, designed to simulate real-world discovery and prevent LLM recitation. Results show that LLM-SR discovers physically accurate equations with better fit and generalization. The method also demonstrates the importance of data-driven feedback, iterative refinement, and program representation in its performance. LLM-SR's ability to integrate scientific knowledge and generate executable code enables it to explore the equation search space more effectively than traditional symbolic regression methods. The framework was tested against various baselines, including evolutionary and deep learning-based methods, and showed superior performance in terms of accuracy and efficiency. The study highlights the potential of LLMs in scientific discovery and the need for specialized methods that effectively integrate prior scientific knowledge into equation discovery.LLM-SR is a novel approach for discovering scientific equations using Large Language Models (LLMs). It leverages the scientific knowledge and code generation capabilities of LLMs to find equations from data by treating equations as programs and combining LLM-generated hypotheses with evolutionary search. The method iteratively proposes equation skeletons, optimizes them against data, and refines them using a dynamic experience buffer. LLM-SR outperforms state-of-the-art symbolic regression methods, particularly in out-of-domain test settings, by efficiently exploring the equation search space and incorporating scientific priors. The framework was evaluated on four benchmark problems across physics, biology, and materials science, designed to simulate real-world discovery and prevent LLM recitation. Results show that LLM-SR discovers physically accurate equations with better fit and generalization. The method also demonstrates the importance of data-driven feedback, iterative refinement, and program representation in its performance. LLM-SR's ability to integrate scientific knowledge and generate executable code enables it to explore the equation search space more effectively than traditional symbolic regression methods. The framework was tested against various baselines, including evolutionary and deep learning-based methods, and showed superior performance in terms of accuracy and efficiency. The study highlights the potential of LLMs in scientific discovery and the need for specialized methods that effectively integrate prior scientific knowledge into equation discovery.
Reach us at info@study.space
[slides] LLM-SR%3A Scientific Equation Discovery via Programming with Large Language Models | StudySpace