Understanding EvoR%3A Evolving Retrieval for Code Generation

The paper introduces ARKS (Active Retrieval in Knowledge Soup), an advanced strategy for enhancing large language models (LLMs) in code generation. Unlike traditional retrieval-augmented generation (RAG) which relies on a single source, ARKS integrates web search, documentation, execution feedback, and evolved code snippets into a "knowledge soup." The active retrieval strategy iteratively refines the query and updates the knowledge soup, improving the quality of generated code. The authors evaluate ARKS using a new benchmark with realistic coding problems involving frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama show significant improvements in average execution accuracy, highlighting the effectiveness of the proposed approach. The analysis confirms the benefits of a diverse knowledge soup and active retrieval, offering insights into effective retrieval-augmented code generation (RACG) pipelines. The model, code, and data are available at <https://arks-codegen.github.io>.The paper introduces ARKS (Active Retrieval in Knowledge Soup), an advanced strategy for enhancing large language models (LLMs) in code generation. Unlike traditional retrieval-augmented generation (RAG) which relies on a single source, ARKS integrates web search, documentation, execution feedback, and evolved code snippets into a "knowledge soup." The active retrieval strategy iteratively refines the query and updates the knowledge soup, improving the quality of generated code. The authors evaluate ARKS using a new benchmark with realistic coding problems involving frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama show significant improvements in average execution accuracy, highlighting the effectiveness of the proposed approach. The analysis confirms the benefits of a diverse knowledge soup and active retrieval, offering insights into effective retrieval-augmented code generation (RACG) pipelines. The model, code, and data are available at <https://arks-codegen.github.io>.

ARKS: Active Retrieval in Knowledge Soup for Code Generation

19 Feb 2024 | Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu