Towards AI-Assisted Synthesis of Verified Dafny Methods

Towards AI-Assisted Synthesis of Verified Dafny Methods

July 2024 | MD RAKIB HOSSAIN MISU*, CRISTINA V. LOPES, IRIS MA, JAMES NOBLE†
This paper explores the potential of large language models (LLMs) in synthesizing verified Dafny methods, a verification-aware programming language. The authors investigate the effectiveness of LLMs in generating correct Dafny code, including formal specifications and validation conditions, using the Dafny verifier. They conduct an empirical study with two contemporary models, GPT-4 and PaLM-2, on 178 problems from the MBPP dataset. The study employs three types of prompts: Contextless, Signature, and Chain of Thought (CoT). The results show that GPT-4 outperforms PaLM-2, with both models performing best with the CoT prompt. GPT-4 generated verified Dafny methods for 58% of the problems, while PaLM-2 managed only 6.74%. The paper also introduces the MBPP-DFY-153 dataset, containing 153 verified Dafny solutions, and discusses the benefits of combining LLMs with program verification techniques. The findings suggest that LLMs can contribute to formal software development, particularly in generating verifiable programs and improving the correctness of synthesized code.This paper explores the potential of large language models (LLMs) in synthesizing verified Dafny methods, a verification-aware programming language. The authors investigate the effectiveness of LLMs in generating correct Dafny code, including formal specifications and validation conditions, using the Dafny verifier. They conduct an empirical study with two contemporary models, GPT-4 and PaLM-2, on 178 problems from the MBPP dataset. The study employs three types of prompts: Contextless, Signature, and Chain of Thought (CoT). The results show that GPT-4 outperforms PaLM-2, with both models performing best with the CoT prompt. GPT-4 generated verified Dafny methods for 58% of the problems, while PaLM-2 managed only 6.74%. The paper also introduces the MBPP-DFY-153 dataset, containing 153 verified Dafny solutions, and discusses the benefits of combining LLMs with program verification techniques. The findings suggest that LLMs can contribute to formal software development, particularly in generating verifiable programs and improving the correctness of synthesized code.
Reach us at info@study.space