June 2024 | FANG LIU, ZHIYI FU, GE LI, ZHI JIN, HUI LIU, YIYANG HAO, LI ZHANG
This paper proposes SANAR, a syntax-aware non-autoregressive model for line-level code completion. The authors argue that tokens within a code statement can be predicted concurrently, which is not typically done in autoregressive models. Through empirical studies, they show that the dependency among code tokens is smaller than in natural language tasks, making parallel generation feasible. SANAR uses a syntax-aware sampling strategy to improve code generation quality and achieves significant speedups over autoregressive models. The model outperforms state-of-the-art code completion approaches on both Python and Java datasets, achieving up to 9× speedup for long target sequences. The results demonstrate that parallel generation is not only possible but also effective for code completion, with SANAR being able to generate entire lines of code in a single decoding pass. The model's performance is evaluated on two benchmark datasets, and the results show that SANAR achieves higher accuracy and faster inference times compared to autoregressive baselines. The study highlights the potential of non-autoregressive generation for code completion, with SANAR being a promising approach for improving code completion efficiency and effectiveness.This paper proposes SANAR, a syntax-aware non-autoregressive model for line-level code completion. The authors argue that tokens within a code statement can be predicted concurrently, which is not typically done in autoregressive models. Through empirical studies, they show that the dependency among code tokens is smaller than in natural language tasks, making parallel generation feasible. SANAR uses a syntax-aware sampling strategy to improve code generation quality and achieves significant speedups over autoregressive models. The model outperforms state-of-the-art code completion approaches on both Python and Java datasets, achieving up to 9× speedup for long target sequences. The results demonstrate that parallel generation is not only possible but also effective for code completion, with SANAR being able to generate entire lines of code in a single decoding pass. The model's performance is evaluated on two benchmark datasets, and the results show that SANAR achieves higher accuracy and faster inference times compared to autoregressive baselines. The study highlights the potential of non-autoregressive generation for code completion, with SANAR being a promising approach for improving code completion efficiency and effectiveness.