2022-3-16 | Yujia Li*, David Choi*, Junyoung Chung*, Nate Kushman*, Julian Schrittwieser*, Rémi Leblond*, Tom Eccles*, James Keeling*, Felix Gimeno*, Agustin Dal Lago*, Thomas Hubert*, Peter Choy*, Cyprien de Masson d'Autume*, Igor Babuschkin, Xinyun Chen, Po-Sen Huang, Johannes Welbl, Sven Gowal, Alexey Cherepanov, James Molloy, Daniel J. Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando de Freitas, Koray Kavukcuoglu and Oriol Vinyals
AlphaCode is a system for generating code that can create novel solutions to competitive programming problems requiring deeper reasoning. It achieves an average ranking of top 54.3% in Codeforces competitions with over 5,000 participants. Three key components are critical to its performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling followed by filtering based on program behavior. AlphaCode uses large transformer language models pre-trained on GitHub code and fine-tuned on a curated set of competitive programming problems. For each unseen problem, it generates a large set of program samples, filters them based on execution results on example tests, and clusters the remaining samples to obtain a small set of candidates for evaluation. The system ensures submissions are rigorously evaluated and that evaluation problems are truly unseen during training. A new training and evaluation dataset, CodeContests, is released, which reduces the false positive rate from 30-60% to 4%. AlphaCode's best model solves 34.2% of held-out competitive programming problems in this dataset. It is evaluated on simulated programming competitions hosted on the Codeforces platform, achieving an average ranking within the top 54.3%. Based on these results, AlphaCode is estimated to have a Codeforces rating of 1238, which is within the top 28% of users who have participated in a contest in the last 6 months. AlphaCode's approach includes pre-training on GitHub code, fine-tuning on competitive programming data, generating a large number of samples, filtering based on example tests, and clustering to obtain candidate submissions. The system uses an encoder-decoder transformer architecture, with an asymmetric design for the encoder and decoder. It also employs tempering, value conditioning and prediction, and GOLD to improve solve rates. The system's performance is evaluated on the Codeforces platform and the CodeContests dataset, showing that it can solve a significant proportion of competitive programming problems. The results demonstrate that AlphaCode is a competitive system in programming competitions, achieving a rating that is within the top 28% of users who have participated in a contest in the last 6 months.AlphaCode is a system for generating code that can create novel solutions to competitive programming problems requiring deeper reasoning. It achieves an average ranking of top 54.3% in Codeforces competitions with over 5,000 participants. Three key components are critical to its performance: (1) an extensive and clean competitive programming dataset for training and evaluation, (2) large and efficient-to-sample transformer-based architectures, and (3) large-scale model sampling followed by filtering based on program behavior. AlphaCode uses large transformer language models pre-trained on GitHub code and fine-tuned on a curated set of competitive programming problems. For each unseen problem, it generates a large set of program samples, filters them based on execution results on example tests, and clusters the remaining samples to obtain a small set of candidates for evaluation. The system ensures submissions are rigorously evaluated and that evaluation problems are truly unseen during training. A new training and evaluation dataset, CodeContests, is released, which reduces the false positive rate from 30-60% to 4%. AlphaCode's best model solves 34.2% of held-out competitive programming problems in this dataset. It is evaluated on simulated programming competitions hosted on the Codeforces platform, achieving an average ranking within the top 54.3%. Based on these results, AlphaCode is estimated to have a Codeforces rating of 1238, which is within the top 28% of users who have participated in a contest in the last 6 months. AlphaCode's approach includes pre-training on GitHub code, fine-tuning on competitive programming data, generating a large number of samples, filtering based on example tests, and clustering to obtain candidate submissions. The system uses an encoder-decoder transformer architecture, with an asymmetric design for the encoder and decoder. It also employs tempering, value conditioning and prediction, and GOLD to improve solve rates. The system's performance is evaluated on the Codeforces platform and the CodeContests dataset, showing that it can solve a significant proportion of competitive programming problems. The results demonstrate that AlphaCode is a competitive system in programming competitions, achieving a rating that is within the top 28% of users who have participated in a contest in the last 6 months.