12 February 2024 | Jia Li, Jiangwei Li, Chenxu Wang, Fons J Verbeek, Tanja Schultz, and Hui Liu
The article presents an advanced method for outlier detection using a scaled minimum spanning tree (MST) and a novel medoid selection strategy. The method aims to improve the effectiveness of clustering-based outlier detection, particularly in medical applications. The key contributions include:
1. **Scaled MST Construction**: A scaled MST is constructed by using scaled distances to distinguish edges between clusters and outliers, addressing the limitations of traditional MSTs in handling datasets with different densities.
2. **Medoid Selection**: A new medoid selection method is introduced to mitigate the impact of noise on the selection of cluster centers, enhancing the quality of outlier identification.
3. **Clustering and Outlier Detection**: The method iteratively cuts the longest edge in the scaled MST to obtain clusters, and the outlierness degree of each point is determined by its distance to the medoid.
The proposed method, named MS2OD, is evaluated on both synthetic and real-world datasets, including medical corpora and other semantically meaningful datasets. Experimental results demonstrate the method's effectiveness and broad applicability, outperforming state-of-the-art methods in terms of accuracy and robustness. The article also discusses the limitations of existing methods and provides a comprehensive comparison with peer algorithms, highlighting the advantages of MS2OD in various scenarios.The article presents an advanced method for outlier detection using a scaled minimum spanning tree (MST) and a novel medoid selection strategy. The method aims to improve the effectiveness of clustering-based outlier detection, particularly in medical applications. The key contributions include:
1. **Scaled MST Construction**: A scaled MST is constructed by using scaled distances to distinguish edges between clusters and outliers, addressing the limitations of traditional MSTs in handling datasets with different densities.
2. **Medoid Selection**: A new medoid selection method is introduced to mitigate the impact of noise on the selection of cluster centers, enhancing the quality of outlier identification.
3. **Clustering and Outlier Detection**: The method iteratively cuts the longest edge in the scaled MST to obtain clusters, and the outlierness degree of each point is determined by its distance to the medoid.
The proposed method, named MS2OD, is evaluated on both synthetic and real-world datasets, including medical corpora and other semantically meaningful datasets. Experimental results demonstrate the method's effectiveness and broad applicability, outperforming state-of-the-art methods in terms of accuracy and robustness. The article also discusses the limitations of existing methods and provides a comprehensive comparison with peer algorithms, highlighting the advantages of MS2OD in various scenarios.