REPOFUSE: Repository-Level Code Completion with Fused Dual Context

REPOFUSE: Repository-Level Code Completion with Fused Dual Context

23 Feb 2024 | Ming Liang, Xiaoheng Xie, Gehao Zhang, Xunjin Zheng, Peng Di, Wei Jiang, Hongwei Chen, Chengpeng Wang, Gang Fan
**REPOFUSE: Repository-Level Code Completion with Fused Dual Context** **Abstract:** The success of language models in code assistance has led to the proposal of repository-level code completion to enhance prediction accuracy by leveraging the context from the entire codebase. However, this increased context can lead to longer inference times, potentially undermining developer experience and tool adoption. This paper introduces REPOFUSE, a novel solution that enhances repository-level code completion without the latency trade-off. REPOFUSE fuses two types of context: *analogy context*, rooted in code analogies, and *rationale context*, encompassing in-depth semantic relationships. It introduces a novel *rank truncated generation (RTG)* technique to efficiently condense these contexts into prompts with restricted size, enabling precise code completions while maintaining inference efficiency. Through testing with the CrossCodeEval suite, REPOFUSE demonstrates a significant improvement over existing models, achieving a 40.90% to 59.75% increase in exact match (EM) accuracy for code completions and a 26.8% enhancement in inference speed. REPOFUSE has been integrated into a large enterprise's workflow, streamlining daily software development tasks. **Introduction:** Language models (LMs) have shown exceptional skill in various programming tasks, particularly in code completion, offering significant potential to enhance developer efficiency. Repository-level code completion extends beyond in-file context, aiming to synthesize unfinished code within the comprehensive context of the entire codebase. This involves leveraging *cross-file context*, which encapsulates high-level abstractions of various code constructs. However, the challenge of the *Context-Latency Conundrum*—the trade-off between richer context improving predictions and longer prompt lengths increasing inference times—has been addressed by REPOFUSE. **Methodology:** REPOFUSE's approach aligns with human programming logic, focusing on both completed tokens and similar code in the repository. It introduces two key concepts: *rationale context* and *analogy context*. The rationale context is derived from import statements, while the analogy context is based on similar code chunks. The RTG technique is used to merge these contexts into a prompt with a fixed size, ensuring high completion accuracy while maintaining efficiency. **Experiments and Results:** REPOFUSE was evaluated using the CrossCodeEval benchmark, demonstrating superior performance over state-of-the-art techniques. It achieved a 40.90% to 59.75% increase in EM accuracy and a 26.8% enhancement in inference speed. The approach has been successfully integrated into a large organization's workflow, streamlining daily software development tasks. **Conclusion and Future Work:** REPOFUSE advances code completion by integrating cross-file context, enhancing the utility of code generation tools without the negative broader impacts of increased energy consumption or greenhouse gas emissions. Future work includes extending REPOFUSE to handle larger Code**REPOFUSE: Repository-Level Code Completion with Fused Dual Context** **Abstract:** The success of language models in code assistance has led to the proposal of repository-level code completion to enhance prediction accuracy by leveraging the context from the entire codebase. However, this increased context can lead to longer inference times, potentially undermining developer experience and tool adoption. This paper introduces REPOFUSE, a novel solution that enhances repository-level code completion without the latency trade-off. REPOFUSE fuses two types of context: *analogy context*, rooted in code analogies, and *rationale context*, encompassing in-depth semantic relationships. It introduces a novel *rank truncated generation (RTG)* technique to efficiently condense these contexts into prompts with restricted size, enabling precise code completions while maintaining inference efficiency. Through testing with the CrossCodeEval suite, REPOFUSE demonstrates a significant improvement over existing models, achieving a 40.90% to 59.75% increase in exact match (EM) accuracy for code completions and a 26.8% enhancement in inference speed. REPOFUSE has been integrated into a large enterprise's workflow, streamlining daily software development tasks. **Introduction:** Language models (LMs) have shown exceptional skill in various programming tasks, particularly in code completion, offering significant potential to enhance developer efficiency. Repository-level code completion extends beyond in-file context, aiming to synthesize unfinished code within the comprehensive context of the entire codebase. This involves leveraging *cross-file context*, which encapsulates high-level abstractions of various code constructs. However, the challenge of the *Context-Latency Conundrum*—the trade-off between richer context improving predictions and longer prompt lengths increasing inference times—has been addressed by REPOFUSE. **Methodology:** REPOFUSE's approach aligns with human programming logic, focusing on both completed tokens and similar code in the repository. It introduces two key concepts: *rationale context* and *analogy context*. The rationale context is derived from import statements, while the analogy context is based on similar code chunks. The RTG technique is used to merge these contexts into a prompt with a fixed size, ensuring high completion accuracy while maintaining efficiency. **Experiments and Results:** REPOFUSE was evaluated using the CrossCodeEval benchmark, demonstrating superior performance over state-of-the-art techniques. It achieved a 40.90% to 59.75% increase in EM accuracy and a 26.8% enhancement in inference speed. The approach has been successfully integrated into a large organization's workflow, streamlining daily software development tasks. **Conclusion and Future Work:** REPOFUSE advances code completion by integrating cross-file context, enhancing the utility of code generation tools without the negative broader impacts of increased energy consumption or greenhouse gas emissions. Future work includes extending REPOFUSE to handle larger Code
Reach us at info@study.space
Understanding REPOFUSE%3A Repository-Level Code Completion with Fused Dual Context