July 2024 | MALINDA DILHARA, ABHIRAM BELLUR, TIMOFey BRYKSIN, DANNY DIG
Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example
Malinda Dilhara, Abhiram Bellur, Timofey Bryksin, and Danny Dig present a novel approach to automate code change patterns (CPATs) using Large Language Models (LLMs). CPATs are repetitive code changes that developers often perform. Current Transformation by Example (TBE) techniques struggle with variations that differ in syntax or data/control flow, despite being semantically similar. LLMs, pre-trained on extensive code datasets, can generate semantically equivalent, previously unseen variants of CPATs, enhancing TBE effectiveness.
The authors developed PyCRAFT, a tool that combines static and dynamic analysis with LLM capabilities to generate code variations. PyCRAFT uses chain-of-thought reasoning to generate variations and test cases, achieving an F-measure of 96.6%. It expands input examples by an average of 58x, inferring transformation rules and automating changes, resulting in up to 39x increase in target code compared to previous tools. PyCRAFT's patches were accepted by 83% of 86 CPAT instances in projects like Microsoft/DeepSpeed and IBM/inFairness.
PyCRAFT generates variants meeting three criteria: correctness (semantic equivalence), usefulness (developer typicality), and applicability (structural intent). It validates variants through syntax, type, import, and semantic checks. Dynamic analysis with test cases ensures conformity. PyCRAFT's parameters are fine-tuned for optimal performance, with higher temperatures effective for dynamic test cases and intermediate for reducing non-useful variants.
PyCRAFT's evaluation shows it generates up to 584 variants per CPAT, with 58% applicable. It outperforms PyEvolve, generating 14x more transformations. PyCRAFT's test cases are validated through mutation testing, achieving 100% success in detecting mutants. GPT-4 generates the most error-free test cases, with 19% error rate.
PyCRAFT's approach leverages LLMs to generate unseen variants, ensuring correctness, usefulness, and applicability. It combines static and dynamic analysis, using test cases to validate transformations. The tool's parameters are optimized for variant generation, with empirical studies showing optimal settings. PyCRAFT's effectiveness is demonstrated through real-world applications, with 83% of CPAT instances accepted by developers. The tool is open-source, enabling reuse by others.Unprecedented Code Change Automation: The Fusion of LLMs and Transformation by Example
Malinda Dilhara, Abhiram Bellur, Timofey Bryksin, and Danny Dig present a novel approach to automate code change patterns (CPATs) using Large Language Models (LLMs). CPATs are repetitive code changes that developers often perform. Current Transformation by Example (TBE) techniques struggle with variations that differ in syntax or data/control flow, despite being semantically similar. LLMs, pre-trained on extensive code datasets, can generate semantically equivalent, previously unseen variants of CPATs, enhancing TBE effectiveness.
The authors developed PyCRAFT, a tool that combines static and dynamic analysis with LLM capabilities to generate code variations. PyCRAFT uses chain-of-thought reasoning to generate variations and test cases, achieving an F-measure of 96.6%. It expands input examples by an average of 58x, inferring transformation rules and automating changes, resulting in up to 39x increase in target code compared to previous tools. PyCRAFT's patches were accepted by 83% of 86 CPAT instances in projects like Microsoft/DeepSpeed and IBM/inFairness.
PyCRAFT generates variants meeting three criteria: correctness (semantic equivalence), usefulness (developer typicality), and applicability (structural intent). It validates variants through syntax, type, import, and semantic checks. Dynamic analysis with test cases ensures conformity. PyCRAFT's parameters are fine-tuned for optimal performance, with higher temperatures effective for dynamic test cases and intermediate for reducing non-useful variants.
PyCRAFT's evaluation shows it generates up to 584 variants per CPAT, with 58% applicable. It outperforms PyEvolve, generating 14x more transformations. PyCRAFT's test cases are validated through mutation testing, achieving 100% success in detecting mutants. GPT-4 generates the most error-free test cases, with 19% error rate.
PyCRAFT's approach leverages LLMs to generate unseen variants, ensuring correctness, usefulness, and applicability. It combines static and dynamic analysis, using test cases to validate transformations. The tool's parameters are optimized for variant generation, with empirical studies showing optimal settings. PyCRAFT's effectiveness is demonstrated through real-world applications, with 83% of CPAT instances accepted by developers. The tool is open-source, enabling reuse by others.