DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning

2024 | Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, Jun Wang
DS-Agent is a novel framework that combines large language models (LLMs) with case-based reasoning (CBR) to automate data science tasks. The framework operates in two stages: the development stage, where it iteratively refines experiment plans using CBR and expert knowledge from Kaggle, and the deployment stage, where it directly generates code based on past successful solutions. In the development stage, DS-Agent uses CBR to retrieve and reuse relevant cases, iteratively adjust the experiment plan, and improve performance through feedback. In the deployment stage, it leverages a simplified CBR framework to adapt past solutions for code generation, reducing the need for extensive LLM training. Empirical results show that DS-Agent achieves 100% success rate in the development stage and a 36% improvement in one-pass rate in the deployment stage compared to alternative LLMs. DS-Agent with GPT-4 achieves the best performance in both stages, with cost efficiencies of $1.60 and $0.13 per run. The framework is open-sourced and demonstrates significant improvements in performance and efficiency, making it suitable for real-world deployment. The integration of CBR enhances the problem-solving capabilities of LLMs in data science tasks, enabling consistent performance improvements through iterative refinement and feedback. The work highlights the potential of combining LLMs with CBR to automate data science tasks, reducing the need for specialized expertise and improving the efficiency of model development and deployment.DS-Agent is a novel framework that combines large language models (LLMs) with case-based reasoning (CBR) to automate data science tasks. The framework operates in two stages: the development stage, where it iteratively refines experiment plans using CBR and expert knowledge from Kaggle, and the deployment stage, where it directly generates code based on past successful solutions. In the development stage, DS-Agent uses CBR to retrieve and reuse relevant cases, iteratively adjust the experiment plan, and improve performance through feedback. In the deployment stage, it leverages a simplified CBR framework to adapt past solutions for code generation, reducing the need for extensive LLM training. Empirical results show that DS-Agent achieves 100% success rate in the development stage and a 36% improvement in one-pass rate in the deployment stage compared to alternative LLMs. DS-Agent with GPT-4 achieves the best performance in both stages, with cost efficiencies of $1.60 and $0.13 per run. The framework is open-sourced and demonstrates significant improvements in performance and efficiency, making it suitable for real-world deployment. The integration of CBR enhances the problem-solving capabilities of LLMs in data science tasks, enabling consistent performance improvements through iterative refinement and feedback. The work highlights the potential of combining LLMs with CBR to automate data science tasks, reducing the need for specialized expertise and improving the efficiency of model development and deployment.
Reach us at info@study.space