20 Mar 2025 | Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, Chandan K. Reddy
The paper introduces LLM-SR (Large Language Models for Scientific Equation Discovery), a novel approach that leverages the scientific knowledge and code generation capabilities of Large Language Models (LLMs) to discover scientific equations from data. LLM-SR treats equations as programs with mathematical operators and combines LLMs' scientific priors with evolutionary search over equation programs. The LLM iteratively proposes new equation skeleton hypotheses, which are then optimized against data to estimate parameters. The method is evaluated on four benchmark problems across diverse scientific domains, demonstrating superior performance in discovering physically accurate equations, especially in out-of-domain test settings. The results show that LLM-SR outperforms state-of-the-art symbolic regression baselines and explores the equation search space more efficiently by incorporating scientific priors. The paper also includes a comprehensive ablation study to highlight the crucial components of LLM-SR's success.The paper introduces LLM-SR (Large Language Models for Scientific Equation Discovery), a novel approach that leverages the scientific knowledge and code generation capabilities of Large Language Models (LLMs) to discover scientific equations from data. LLM-SR treats equations as programs with mathematical operators and combines LLMs' scientific priors with evolutionary search over equation programs. The LLM iteratively proposes new equation skeleton hypotheses, which are then optimized against data to estimate parameters. The method is evaluated on four benchmark problems across diverse scientific domains, demonstrating superior performance in discovering physically accurate equations, especially in out-of-domain test settings. The results show that LLM-SR outperforms state-of-the-art symbolic regression baselines and explores the equation search space more efficiently by incorporating scientific priors. The paper also includes a comprehensive ablation study to highlight the crucial components of LLM-SR's success.