Open Graph Benchmark: Datasets for Machine Learning on Graphs

Open Graph Benchmark: Datasets for Machine Learning on Graphs

25 Feb 2021 | Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, Jure Leskovec
The OPEN GRAPH BENCHMARK (OGB) is a comprehensive set of benchmark datasets designed to facilitate scalable, robust, and reproducible research in graph machine learning (ML). OGB datasets are large-scale, covering multiple important graph ML tasks and diverse domains, including social and information networks, biological networks, molecular graphs, source code ASTs, and knowledge graphs. Each dataset includes a unified evaluation protocol with meaningful application-specific data splits and metrics. OGB also provides an automated end-to-end graph ML pipeline for simplifying data loading, experimental setup, and model evaluation. The benchmark aims to address the shortcomings of current benchmarks, such as small dataset sizes and lack of realistic data splits, by offering datasets that are orders of magnitude larger and more diverse. OGB datasets are publicly available at https://ogb.stanford.edu, and the community is encouraged to contribute and provide feedback. The paper discusses the challenges and opportunities presented by each dataset, particularly in scaling models to large graphs and improving out-of-distribution generalization under realistic data splits.The OPEN GRAPH BENCHMARK (OGB) is a comprehensive set of benchmark datasets designed to facilitate scalable, robust, and reproducible research in graph machine learning (ML). OGB datasets are large-scale, covering multiple important graph ML tasks and diverse domains, including social and information networks, biological networks, molecular graphs, source code ASTs, and knowledge graphs. Each dataset includes a unified evaluation protocol with meaningful application-specific data splits and metrics. OGB also provides an automated end-to-end graph ML pipeline for simplifying data loading, experimental setup, and model evaluation. The benchmark aims to address the shortcomings of current benchmarks, such as small dataset sizes and lack of realistic data splits, by offering datasets that are orders of magnitude larger and more diverse. OGB datasets are publicly available at https://ogb.stanford.edu, and the community is encouraged to contribute and provide feedback. The paper discusses the challenges and opportunities presented by each dataset, particularly in scaling models to large graphs and improving out-of-distribution generalization under realistic data splits.
Reach us at info@study.space
Understanding Open Graph Benchmark%3A Datasets for Machine Learning on Graphs