Understanding Non-Autoregressive Line-Level Code Completion

This paper addresses the challenge of improving the accuracy and efficiency of code completion tools by proposing a non-autoregressive approach for line-level code completion. The authors argue that tokens within a code statement can be predicted concurrently, rather than sequentially, to enhance the quality and speed of code completion. They conduct an empirical study to analyze the dependency among target tokens in line-level code completion, finding that left-to-right generation is not always optimal. To address this, they introduce SANAR (Syntax-Aware Non-AutoRegressive model), a novel approach that uses a non-autoregressive decoder to generate code tokens in parallel. SANAR employs an adaptive and syntax-aware sampling strategy to improve training efficiency and code quality. Experimental results on two widely used datasets (Python and Java) show that SANAR outperforms state-of-the-art autoregressive models by a significant margin, achieving up to 9 times faster inference speed while maintaining or improving code quality. The paper also discusses the effectiveness of the proposed training strategy and provides a detailed analysis of various factors affecting SANAR's performance.This paper addresses the challenge of improving the accuracy and efficiency of code completion tools by proposing a non-autoregressive approach for line-level code completion. The authors argue that tokens within a code statement can be predicted concurrently, rather than sequentially, to enhance the quality and speed of code completion. They conduct an empirical study to analyze the dependency among target tokens in line-level code completion, finding that left-to-right generation is not always optimal. To address this, they introduce SANAR (Syntax-Aware Non-AutoRegressive model), a novel approach that uses a non-autoregressive decoder to generate code tokens in parallel. SANAR employs an adaptive and syntax-aware sampling strategy to improve training efficiency and code quality. Experimental results on two widely used datasets (Python and Java) show that SANAR outperforms state-of-the-art autoregressive models by a significant margin, achieving up to 9 times faster inference speed while maintaining or improving code quality. The paper also discusses the effectiveness of the proposed training strategy and provides a detailed analysis of various factors affecting SANAR's performance.

Non-Autoregressive Line-Level Code Completion

June 2024 | FANG LIU, ZHIYI FU, GE LI, ZHI JIN, HUI LIU, LIYANG HAO, LI ZHANG