Understanding Repoformer%3A Selective Retrieval for Repository-Level Code Completion

The paper "REPOFORMER: Selective Retrieval for Repository-Level Code Completion" addresses the inefficiencies and robustness issues in existing retrieval-augmented generation (RAG) methods for repository-level code completion. It proposes a selective RAG framework that avoids unnecessary retrievals, enhancing both efficiency and performance. The key contributions include: 1. **Selective RAG Framework**: The framework allows the code LM to decide whether retrieval is beneficial, abstaining from it when deemed unnecessary. This decision is based on self-evaluation and robustness to retrieved contexts. 2. **Self-Supervised Learning**: A novel multi-task objective is designed to train the code LM to accurately self-evaluate the need for retrieval and robustly complete code with optional retrieval augmentation. 3. **State-of-the-Art Performance**: REPOFORMER achieves superior performance on diverse benchmarks, including RepoEval, CrossCodeEval, and CrossCodeLongEval, with up to 70% inference speedup without compromising accuracy. 4. **Generalizability**: The framework is adaptable to different generation models, retrievers, and programming languages, demonstrating its broad applicability. 5. **Efficiency**: The selective retrieval mechanism significantly reduces latency, making it suitable for real-world coding environments. The paper also includes a detailed analysis of the framework's performance, robustness, and generalization, along with ablation studies to validate the effectiveness of the proposed approach.The paper "REPOFORMER: Selective Retrieval for Repository-Level Code Completion" addresses the inefficiencies and robustness issues in existing retrieval-augmented generation (RAG) methods for repository-level code completion. It proposes a selective RAG framework that avoids unnecessary retrievals, enhancing both efficiency and performance. The key contributions include: 1. **Selective RAG Framework**: The framework allows the code LM to decide whether retrieval is beneficial, abstaining from it when deemed unnecessary. This decision is based on self-evaluation and robustness to retrieved contexts. 2. **Self-Supervised Learning**: A novel multi-task objective is designed to train the code LM to accurately self-evaluate the need for retrieval and robustly complete code with optional retrieval augmentation. 3. **State-of-the-Art Performance**: REPOFORMER achieves superior performance on diverse benchmarks, including RepoEval, CrossCodeEval, and CrossCodeLongEval, with up to 70% inference speedup without compromising accuracy. 4. **Generalizability**: The framework is adaptable to different generation models, retrievers, and programming languages, demonstrating its broad applicability. 5. **Efficiency**: The selective retrieval mechanism significantly reduces latency, making it suitable for real-world coding environments. The paper also includes a detailed analysis of the framework's performance, robustness, and generalization, along with ablation studies to validate the effectiveness of the proposed approach.

REPOFORMER: Selective Retrieval for Repository-Level Code Completion

4 Jun 2024 | Di Wu, Wasi Uddin Ahmad, Dejiao Zhang, Murali Krishna Ramanathan, Xiaofei Ma