[slides] Automated Statistical Model Discovery with Language Models

The paper introduces a method for automated statistical model discovery using language models (LMs). The method, called BoxLM, leverages LMs to propose and critique probabilistic models, following the framework of Box's Loop. LMs act as both modelers and domain experts, proposing models represented as probabilistic programs and providing feedback to guide the next round of model building. The approach does not require a domain-specific language or handcrafted search procedures, making it more flexible and efficient. The paper evaluates BoxLM in three settings: searching within a restricted space of models, searching over an open-ended space, and improving expert models under natural language constraints. Results show that BoxLM can identify models comparable to human-designed models and extend classic models in interpretable ways, highlighting the potential of LM-driven model discovery.The paper introduces a method for automated statistical model discovery using language models (LMs). The method, called BoxLM, leverages LMs to propose and critique probabilistic models, following the framework of Box's Loop. LMs act as both modelers and domain experts, proposing models represented as probabilistic programs and providing feedback to guide the next round of model building. The approach does not require a domain-specific language or handcrafted search procedures, making it more flexible and efficient. The paper evaluates BoxLM in three settings: searching within a restricted space of models, searching over an open-ended space, and improving expert models under natural language constraints. Results show that BoxLM can identify models comparable to human-designed models and extend classic models in interpretable ways, highlighting the potential of LM-driven model discovery.

Automated Statistical Model Discovery with Language Models

2024 | Michael Y. Li, Emily B. Fox, Noah D. Goodman