6 Feb 2024 | Omer Dunay, Daniel Cheng, Adam Tait, Parth Thakkar, Peter C. Rigby, Andy Chiu, Imad Ahmad, Arun Ganesan, Chandra Maddila, Vijayaraghavan Murali, Ali Tayyebi, Nachi Nagappan
CODECompose is an AI-assisted code authoring tool powered by large language models (LLMs) that provides inline suggestions to tens of thousands of developers at Meta. This paper presents how the product was scaled from single-line to multi-line suggestions, addressing key challenges in usability and performance.
The main challenges included: (1) minimizing the "jarring" effect of multi-line suggestions, which can disrupt user workflow by moving existing code; (2) reducing latency for generating long multi-line suggestions; and (3) rolling out and evaluating the impact of multi-line suggestions across thousands of developers.
To address the jarring effect, multi-line suggestions are only triggered when the cursor is at the end of the current scope, and suggestions are shown until the end of the current block. This minimizes disruption to the user's workflow.
To reduce latency, the team implemented several optimizations, including Flash Attention, CUDA graphs, and streaming with early cancellation. These improvements reduced the median latency of multi-line suggestions from 2000ms to 750ms, significantly increasing the display rate and user acceptance.
Experiments showed that multi-line suggestions account for 42% of total characters accepted, despite only accounting for 16% of displayed suggestions. Multi-line suggestions also increased the percentage of keystrokes saved from 9% to 17%. Multi-line suggestions were rolled out to all engineers at Meta, with less than 1% opting out.
The study highlights the importance of balancing usability and performance in AI-assisted code authoring. While multi-line suggestions are more complex and have higher latency, they provide significant benefits in terms of productivity and code quality. The results demonstrate that multi-line suggestions are effective and widely adopted, with positive user feedback and high acceptance rates. The study also underscores the need for further research on the impact of multi-line suggestions in different contexts and environments.CODECompose is an AI-assisted code authoring tool powered by large language models (LLMs) that provides inline suggestions to tens of thousands of developers at Meta. This paper presents how the product was scaled from single-line to multi-line suggestions, addressing key challenges in usability and performance.
The main challenges included: (1) minimizing the "jarring" effect of multi-line suggestions, which can disrupt user workflow by moving existing code; (2) reducing latency for generating long multi-line suggestions; and (3) rolling out and evaluating the impact of multi-line suggestions across thousands of developers.
To address the jarring effect, multi-line suggestions are only triggered when the cursor is at the end of the current scope, and suggestions are shown until the end of the current block. This minimizes disruption to the user's workflow.
To reduce latency, the team implemented several optimizations, including Flash Attention, CUDA graphs, and streaming with early cancellation. These improvements reduced the median latency of multi-line suggestions from 2000ms to 750ms, significantly increasing the display rate and user acceptance.
Experiments showed that multi-line suggestions account for 42% of total characters accepted, despite only accounting for 16% of displayed suggestions. Multi-line suggestions also increased the percentage of keystrokes saved from 9% to 17%. Multi-line suggestions were rolled out to all engineers at Meta, with less than 1% opting out.
The study highlights the importance of balancing usability and performance in AI-assisted code authoring. While multi-line suggestions are more complex and have higher latency, they provide significant benefits in terms of productivity and code quality. The results demonstrate that multi-line suggestions are effective and widely adopted, with positive user feedback and high acceptance rates. The study also underscores the need for further research on the impact of multi-line suggestions in different contexts and environments.