April 2024 | JIALU ZHANG, JOSÉ PABLO CAMBRONERO, SUMIT GULWANI, VU LE, RUZICA PISKAC, GUSTAVO SOARES, GUST VERBRUGGEN
PyDex is a system that uses large language models (LLMs) to automatically repair bugs in introductory Python programming assignments. It combines multi-modal prompts, iterative querying, test-case-based few-shot selection, and program chunking to address both syntactic and semantic errors. The system was evaluated on 286 real student programs from an introductory Python course in India. PyDex outperformed three baselines—BIFI, Refactory, and GenProg—in terms of repair rate and patch size. It repaired 86.71% of programs without few-shot learning, increasing to 96.5% with few-shot learning. The average token edit distance for PyDex patches was significantly smaller than that of the baselines. PyDex also showed better performance in handling complex repairs and maintaining the structure of the original program. The system's approach leverages the strengths of LLMs to generate efficient and effective repairs, making it a promising tool for automated program repair in education.PyDex is a system that uses large language models (LLMs) to automatically repair bugs in introductory Python programming assignments. It combines multi-modal prompts, iterative querying, test-case-based few-shot selection, and program chunking to address both syntactic and semantic errors. The system was evaluated on 286 real student programs from an introductory Python course in India. PyDex outperformed three baselines—BIFI, Refactory, and GenProg—in terms of repair rate and patch size. It repaired 86.71% of programs without few-shot learning, increasing to 96.5% with few-shot learning. The average token edit distance for PyDex patches was significantly smaller than that of the baselines. PyDex also showed better performance in handling complex repairs and maintaining the structure of the original program. The system's approach leverages the strengths of LLMs to generate efficient and effective repairs, making it a promising tool for automated program repair in education.