19 Feb 2024 | Hongjin Su, Shuyang Jiang, Yuhang Lai, Haoyuan Wu, Boao Shi, Che Liu, Qian Liu, Tao Yu
ARKS: Active Retrieval in Knowledge Soup for Code Generation
This paper introduces ARKS, an advanced strategy for generalizing large language models (LLMs) for code generation. Unlike traditional methods that rely on a single source of knowledge, ARKS constructs a "knowledge soup" that integrates web search, documentation, execution feedback, and evolved code snippets. The system employs an active retrieval strategy that iteratively refines the query and updates the knowledge soup. This approach allows for continuous improvement of both the query and the knowledge soup, leading to higher quality code generation.
The effectiveness of ARKS is evaluated using a new benchmark comprising realistic coding problems associated with frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama demonstrate significant improvements in average execution accuracy. The analysis confirms the effectiveness of the knowledge soup and active retrieval strategies, offering insights into the construction of effective retrieval-augmented code generation (RACG) pipelines.
The paper presents a detailed description of the ARKS framework, including the query formulation, knowledge soup construction, active retrieval mechanism, and benchmark datasets. The knowledge soup includes web search content, documentation, execution feedback, and code snippets. The active retrieval process involves iteratively refining the query and updating the knowledge soup, which allows for continuous improvement in code generation.
The paper also discusses the impact of different retrieval models and the effectiveness of various query formulations. The results show that the use of code snippets and documentation significantly improves the performance of LLMs in code generation, especially for less common programming languages. The study also highlights the importance of retrieval accuracy and the effectiveness of dense retrievers in improving generalization performance.
The paper concludes that ARKS provides a significant improvement in code generation by leveraging a diverse knowledge soup and active retrieval. The findings suggest that the integration of diverse knowledge sources and iterative refinement of queries can lead to more effective code generation. The study also highlights the importance of retrieval accuracy and the effectiveness of dense retrievers in improving generalization performance. The results demonstrate that ARKS is a promising approach for enhancing code generation with retrieval-augmented methods.ARKS: Active Retrieval in Knowledge Soup for Code Generation
This paper introduces ARKS, an advanced strategy for generalizing large language models (LLMs) for code generation. Unlike traditional methods that rely on a single source of knowledge, ARKS constructs a "knowledge soup" that integrates web search, documentation, execution feedback, and evolved code snippets. The system employs an active retrieval strategy that iteratively refines the query and updates the knowledge soup. This approach allows for continuous improvement of both the query and the knowledge soup, leading to higher quality code generation.
The effectiveness of ARKS is evaluated using a new benchmark comprising realistic coding problems associated with frequently updated libraries and long-tail programming languages. Experimental results on ChatGPT and CodeLlama demonstrate significant improvements in average execution accuracy. The analysis confirms the effectiveness of the knowledge soup and active retrieval strategies, offering insights into the construction of effective retrieval-augmented code generation (RACG) pipelines.
The paper presents a detailed description of the ARKS framework, including the query formulation, knowledge soup construction, active retrieval mechanism, and benchmark datasets. The knowledge soup includes web search content, documentation, execution feedback, and code snippets. The active retrieval process involves iteratively refining the query and updating the knowledge soup, which allows for continuous improvement in code generation.
The paper also discusses the impact of different retrieval models and the effectiveness of various query formulations. The results show that the use of code snippets and documentation significantly improves the performance of LLMs in code generation, especially for less common programming languages. The study also highlights the importance of retrieval accuracy and the effectiveness of dense retrievers in improving generalization performance.
The paper concludes that ARKS provides a significant improvement in code generation by leveraging a diverse knowledge soup and active retrieval. The findings suggest that the integration of diverse knowledge sources and iterative refinement of queries can lead to more effective code generation. The study also highlights the importance of retrieval accuracy and the effectiveness of dense retrievers in improving generalization performance. The results demonstrate that ARKS is a promising approach for enhancing code generation with retrieval-augmented methods.