ARKS: Active Retrieval in Knowledge Soup for Code Generation

ARKS: Active Retrieval in Knowledge Soup for Code Generation

19 Feb 2024 | Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu
The paper introduces ARKS (Active Retrieval in Knowledge Soup), an advanced strategy for enhancing large language models (LLMs) in code generation. Unlike traditional retrieval-augmented generation (RAG) which relies on a single source, ARKS integrates web search, documentation, execution feedback, and evolved code snippets into a "knowledge soup." The active retrieval strategy iteratively refines the query and updates the knowledge soup, improving the quality of generated code. The authors evaluate ARKS using a new benchmark with realistic coding problems involving frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama show significant improvements in average execution accuracy, highlighting the effectiveness of the proposed approach. The analysis confirms the benefits of a diverse knowledge soup and active retrieval, offering insights into effective retrieval-augmented code generation (RACG) pipelines. The model, code, and data are available at <https://arks-codegen.github.io>.The paper introduces ARKS (Active Retrieval in Knowledge Soup), an advanced strategy for enhancing large language models (LLMs) in code generation. Unlike traditional retrieval-augmented generation (RAG) which relies on a single source, ARKS integrates web search, documentation, execution feedback, and evolved code snippets into a "knowledge soup." The active retrieval strategy iteratively refines the query and updates the knowledge soup, improving the quality of generated code. The authors evaluate ARKS using a new benchmark with realistic coding problems involving frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama show significant improvements in average execution accuracy, highlighting the effectiveness of the proposed approach. The analysis confirms the benefits of a diverse knowledge soup and active retrieval, offering insights into effective retrieval-augmented code generation (RACG) pipelines. The model, code, and data are available at <https://arks-codegen.github.io>.
Reach us at info@study.space
Understanding EvoR%3A Evolving Retrieval for Code Generation