15 Jul 2024 | George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, Swarat Chaudhuri
PUTNAMBENCH is a new multilingual benchmark for evaluating neural theorem-provers on the Putnam Mathematical Competition. It includes 1697 formalizations of 640 theorems from the competition, with formalizations in Lean 4, Isabelle, and Coq. The theorems require significant problem-solving ability and knowledge from undergraduate mathematics. The benchmark is used to evaluate several neural and symbolic theorem-provers, which can only solve a handful of the problems, highlighting the difficulty of the task. PUTNAMBENCH is available at https://github.com/trishullab/PutnamBench. The benchmark includes the original problem statements with permission from the Mathematical Association of America. It is multilingual, supporting all three formal proof languages, and includes a wide range of mathematical topics. The benchmark is designed to challenge current neural theorem-proving methods, as they struggle to solve even a small number of the problems. The results show that current methods are not yet capable of solving most of the problems in PUTNAMBENCH, indicating the need for further research in this area.PUTNAMBENCH is a new multilingual benchmark for evaluating neural theorem-provers on the Putnam Mathematical Competition. It includes 1697 formalizations of 640 theorems from the competition, with formalizations in Lean 4, Isabelle, and Coq. The theorems require significant problem-solving ability and knowledge from undergraduate mathematics. The benchmark is used to evaluate several neural and symbolic theorem-provers, which can only solve a handful of the problems, highlighting the difficulty of the task. PUTNAMBENCH is available at https://github.com/trishullab/PutnamBench. The benchmark includes the original problem statements with permission from the Mathematical Association of America. It is multilingual, supporting all three formal proof languages, and includes a wide range of mathematical topics. The benchmark is designed to challenge current neural theorem-proving methods, as they struggle to solve even a small number of the problems. The results show that current methods are not yet capable of solving most of the problems in PUTNAMBENCH, indicating the need for further research in this area.