[slides] Open Graph Benchmark%3A Datasets for Machine Learning on Graphs

The OPEN GRAPH BENCHMARK (OGB) is a comprehensive set of benchmark datasets designed to advance scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, covering a wide range of domains, including social networks, biological networks, molecular graphs, source code ASTs, and knowledge graphs. Each dataset includes a unified evaluation protocol with meaningful data splits and metrics. OGB also provides extensive benchmark experiments, highlighting challenges such as scalability to large graphs and out-of-distribution generalization. The OGB pipeline automates data loading, experimental setup, and model evaluation, offering public leaderboards and code for reproducible research. OGB datasets are regularly updated and welcome community input. The datasets, along with data loaders, evaluation scripts, and baseline code, are publicly available at https://ogb.stanford.edu. OGB includes 15 diverse graph datasets across three task categories: node property prediction, link property prediction, and graph property prediction. These datasets are categorized by task, domain, and scale, with sizes ranging from small to large. OGB datasets are designed to be realistic, diverse, and scalable, enabling the evaluation of graph ML models across various applications. The datasets include networks such as Amazon products co-purchasing, protein-protein associations, paper citations, and heterogeneous academic graphs. OGB provides realistic data splits, domain-specific splits, and supports multiple graph ML tasks, including node, link, and graph property prediction. OGB addresses the shortcomings of existing benchmarks, such as small dataset sizes, unrealistic splits, and lack of standardized protocols. The datasets are designed to be challenging and realistic, enabling the evaluation of models under realistic data splits. OGB also provides an automated end-to-end pipeline for graph ML, simplifying the process of data loading, experimental setup, and model evaluation. The OGB website provides documentation, example scripts, and public leaderboards for tracking progress in graph ML research. OGB is an ongoing open-source initiative, with plans to release new datasets and methods, and to update the leaderboard. The OGB datasets are available for research and development, with a focus on improving scalability, generalization, and performance in graph ML.The OPEN GRAPH BENCHMARK (OGB) is a comprehensive set of benchmark datasets designed to advance scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, covering a wide range of domains, including social networks, biological networks, molecular graphs, source code ASTs, and knowledge graphs. Each dataset includes a unified evaluation protocol with meaningful data splits and metrics. OGB also provides extensive benchmark experiments, highlighting challenges such as scalability to large graphs and out-of-distribution generalization. The OGB pipeline automates data loading, experimental setup, and model evaluation, offering public leaderboards and code for reproducible research. OGB datasets are regularly updated and welcome community input. The datasets, along with data loaders, evaluation scripts, and baseline code, are publicly available at https://ogb.stanford.edu. OGB includes 15 diverse graph datasets across three task categories: node property prediction, link property prediction, and graph property prediction. These datasets are categorized by task, domain, and scale, with sizes ranging from small to large. OGB datasets are designed to be realistic, diverse, and scalable, enabling the evaluation of graph ML models across various applications. The datasets include networks such as Amazon products co-purchasing, protein-protein associations, paper citations, and heterogeneous academic graphs. OGB provides realistic data splits, domain-specific splits, and supports multiple graph ML tasks, including node, link, and graph property prediction. OGB addresses the shortcomings of existing benchmarks, such as small dataset sizes, unrealistic splits, and lack of standardized protocols. The datasets are designed to be challenging and realistic, enabling the evaluation of models under realistic data splits. OGB also provides an automated end-to-end pipeline for graph ML, simplifying the process of data loading, experimental setup, and model evaluation. The OGB website provides documentation, example scripts, and public leaderboards for tracking progress in graph ML research. OGB is an ongoing open-source initiative, with plans to release new datasets and methods, and to update the leaderboard. The OGB datasets are available for research and development, with a focus on improving scalability, generalization, and performance in graph ML.

Open Graph Benchmark: Datasets for Machine Learning on Graphs

25 Feb 2021 | Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec