12 February 2024 | Jia Li, Jiangwei Li, Chenxu Wang, Fons J Verbeek, Tanja Schultz, and Hui Liu
This paper introduces MS2OD, an advanced outlier detection method that combines minimum spanning tree (MST) and medoid selection. The method first constructs a scaled MST and iteratively cuts the longest edge to form clusters. It then uses a novel medoid selection strategy to identify cluster centers and compute the outlierness of points based on their distance to the medoid. The method is evaluated on real-world datasets, including medical data and other semantically meaningful datasets, and outperforms existing methods in terms of accuracy and effectiveness. The results show that MS2OD is particularly effective in detecting outliers in datasets with varying densities. The method is also compared with other state-of-the-art outlier detection techniques, including ABOD, k-NN, LOF, and OCSVM, and demonstrates superior performance in most cases. The paper concludes that MS2OD is a promising approach for outlier detection in various applications, especially in medical data analysis.This paper introduces MS2OD, an advanced outlier detection method that combines minimum spanning tree (MST) and medoid selection. The method first constructs a scaled MST and iteratively cuts the longest edge to form clusters. It then uses a novel medoid selection strategy to identify cluster centers and compute the outlierness of points based on their distance to the medoid. The method is evaluated on real-world datasets, including medical data and other semantically meaningful datasets, and outperforms existing methods in terms of accuracy and effectiveness. The results show that MS2OD is particularly effective in detecting outliers in datasets with varying densities. The method is also compared with other state-of-the-art outlier detection techniques, including ABOD, k-NN, LOF, and OCSVM, and demonstrates superior performance in most cases. The paper concludes that MS2OD is a promising approach for outlier detection in various applications, especially in medical data analysis.