8 Jun 2020 | Hamel Husain, Ho-Hsiang Wu, Tiferet Gazit, Miltiadis Allamanis, Marc Brockschmidt
The CodeSearchNet Challenge aims to evaluate the state of semantic code search, a task that involves retrieving relevant code from natural language queries. The challenge is supported by the CodeSearchNet Corpus, which contains about 6 million functions from open-source code across six programming languages (Go, Java, JavaScript, PHP, Python, and Ruby). The corpus includes 2 million function-documentation pairs and 4,026 expert annotations for 99 natural language queries. The challenge provides a realistic dataset for training high-capacity deep neural models and includes a leaderboard to track progress. The authors describe the methodology for obtaining the corpus and expert labels, as well as baseline solutions using various neural sequence processing techniques. They also discuss the limitations of the dataset and the challenges in code search, such as query ambiguity and low-quality results. The paper concludes with a discussion of related work and open challenges in the field.The CodeSearchNet Challenge aims to evaluate the state of semantic code search, a task that involves retrieving relevant code from natural language queries. The challenge is supported by the CodeSearchNet Corpus, which contains about 6 million functions from open-source code across six programming languages (Go, Java, JavaScript, PHP, Python, and Ruby). The corpus includes 2 million function-documentation pairs and 4,026 expert annotations for 99 natural language queries. The challenge provides a realistic dataset for training high-capacity deep neural models and includes a leaderboard to track progress. The authors describe the methodology for obtaining the corpus and expert labels, as well as baseline solutions using various neural sequence processing techniques. They also discuss the limitations of the dataset and the challenges in code search, such as query ambiguity and low-quality results. The paper concludes with a discussion of related work and open challenges in the field.