[slides and audio] STALL%2B%3A Boosting LLM-based Repository-level Code Completion with Static Analysis

This paper explores the integration of static analysis into LLM-based repository-level code completion, a challenging task that involves generating code based on complex contexts from multiple files in a repository. The authors propose a framework called STALL+, which supports the customizable integration of multiple static analysis strategies into the complete pipeline of LLM-based repository-level code completion. Extensive experiments using three state-of-the-art code LLMs (DeepSeekCoder-6.7B, StarCoderBase-7B, and CodeLlama-7B) on the CrossCodeEval benchmark reveal several key findings: 1. **Effectiveness and Efficiency**: Integrating static analysis in any phase of code completion improves performance, with the prompting phase performing the best and the post-processing phase performing the worst. 2. **Complementary Strategies**: Combining multiple integration strategies can further enhance performance, but the effectiveness varies depending on the specific combination. 3. **Performance with RAG**: Static analysis integration outperforms RAG in repository-level code completion, and combining both techniques achieves the best accuracy. 4. **Efficiency**: Integrating static analysis in the prompting phase is the most efficient, while combining RAG with prompting-phase static analysis is the best option for cost-effectiveness. The study also highlights the importance of addressing the limitations of static analysis, particularly in dynamic languages, and suggests future directions for more flexible and efficient integration strategies.This paper explores the integration of static analysis into LLM-based repository-level code completion, a challenging task that involves generating code based on complex contexts from multiple files in a repository. The authors propose a framework called STALL+, which supports the customizable integration of multiple static analysis strategies into the complete pipeline of LLM-based repository-level code completion. Extensive experiments using three state-of-the-art code LLMs (DeepSeekCoder-6.7B, StarCoderBase-7B, and CodeLlama-7B) on the CrossCodeEval benchmark reveal several key findings: 1. **Effectiveness and Efficiency**: Integrating static analysis in any phase of code completion improves performance, with the prompting phase performing the best and the post-processing phase performing the worst. 2. **Complementary Strategies**: Combining multiple integration strategies can further enhance performance, but the effectiveness varies depending on the specific combination. 3. **Performance with RAG**: Static analysis integration outperforms RAG in repository-level code completion, and combining both techniques achieves the best accuracy. 4. **Efficiency**: Integrating static analysis in the prompting phase is the most efficient, while combining RAG with prompting-phase static analysis is the best option for cost-effectiveness. The study also highlights the importance of addressing the limitations of static analysis, particularly in dynamic languages, and suggests future directions for more flexible and efficient integration strategies.

STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis

14 Jun 2024 | Junwei Liu, Yixuan Chen, Mingwei Liu, Xin Peng, Yiling Lou