2 Oct 2024 | Sarah Fakhoury, Aaditya Naik, Georgios Sakkas, Saikat Chakraborty and Shuvendu K. Lahiri
The paper introduces TiCODER, an interactive workflow that uses tests to clarify user intent and improve the accuracy of code generation by large language models (LLMs). The workflow involves generating code and tests, then using user feedback to prune and rank code suggestions. The study evaluates TiCODER through a mixed-methods user study with 15 participants and a large-scale evaluation on two Python benchmarks. Results show that participants using TiCODER are significantly more likely to correctly evaluate AI-generated code and report less cognitive load. The workflow also improves code generation accuracy, with an average absolute improvement of 45.97% in pass@1 accuracy across datasets and LLMs within 5 user interactions. TiCODER is shown to be effective in improving code generation accuracy for both open and closed-source LLMs. The study also highlights the importance of user feedback in refining code suggestions and reducing the risk of subtle bugs in AI-generated code. The paper concludes that TiCODER enhances the accuracy of code generation and reduces the cognitive burden on developers evaluating AI-generated code.The paper introduces TiCODER, an interactive workflow that uses tests to clarify user intent and improve the accuracy of code generation by large language models (LLMs). The workflow involves generating code and tests, then using user feedback to prune and rank code suggestions. The study evaluates TiCODER through a mixed-methods user study with 15 participants and a large-scale evaluation on two Python benchmarks. Results show that participants using TiCODER are significantly more likely to correctly evaluate AI-generated code and report less cognitive load. The workflow also improves code generation accuracy, with an average absolute improvement of 45.97% in pass@1 accuracy across datasets and LLMs within 5 user interactions. TiCODER is shown to be effective in improving code generation accuracy for both open and closed-source LLMs. The study also highlights the importance of user feedback in refining code suggestions and reducing the risk of subtle bugs in AI-generated code. The paper concludes that TiCODER enhances the accuracy of code generation and reduces the cognitive burden on developers evaluating AI-generated code.