18 Jun 2019 | Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, Stephan Günnemann
The paper "Pitfalls of Graph Neural Network Evaluation" by Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann from the Technical University of Munich, addresses the shortcomings in the evaluation strategies for Graph Neural Network (GNN) models. The authors highlight that using the same train/validation/test splits and making significant changes to the training procedure can lead to unfair comparisons between different GNN architectures. They perform a thorough empirical evaluation of four prominent GNN models—GCN, MoNet, GraphSAGE, and GAT—on four well-known citation network datasets and introduce four new datasets for the node classification task. The evaluation is conducted using a standardized training and hyperparameter selection procedure, with 100 random train/validation/test splits and 20 random initializations for each split. The results show that simpler GNN architectures can outperform more sophisticated ones if the hyperparameters and training procedures are tuned fairly. The paper also demonstrates the importance of considering multiple data splits to avoid misleading results and to better assess the generalization performance of different models.The paper "Pitfalls of Graph Neural Network Evaluation" by Oleksandr Shchur, Maximilian Mumme, Aleksandar Bojchevski, and Stephan Günnemann from the Technical University of Munich, addresses the shortcomings in the evaluation strategies for Graph Neural Network (GNN) models. The authors highlight that using the same train/validation/test splits and making significant changes to the training procedure can lead to unfair comparisons between different GNN architectures. They perform a thorough empirical evaluation of four prominent GNN models—GCN, MoNet, GraphSAGE, and GAT—on four well-known citation network datasets and introduce four new datasets for the node classification task. The evaluation is conducted using a standardized training and hyperparameter selection procedure, with 100 random train/validation/test splits and 20 random initializations for each split. The results show that simpler GNN architectures can outperform more sophisticated ones if the hyperparameters and training procedures are tuned fairly. The paper also demonstrates the importance of considering multiple data splits to avoid misleading results and to better assess the generalization performance of different models.