August 24–27, 2008 | Hans-Peter Kriegel Matthias Schubert Arthur Zimek
This paper introduces a novel approach to outlier detection in high-dimensional data, named Angle-Based Outlier Detection (ABOD). ABOD assesses the variance in the angles between difference vectors of a point to other points, which helps to mitigate the "curse of dimensionality" compared to purely distance-based methods. The main advantage of ABOD is that it does not rely on parameter selection, unlike other methods. The authors compare ABOD to the well-established distance-based method LOF using both artificial and real-world datasets, demonstrating that ABOD performs particularly well on high-dimensional data. The paper also discusses two variants of ABOD: FastABOD, which uses a sample of the database to approximate the angle-based outlier factor (ABOF), and LB-ABOD, a filter-refinement approach that uses a lower bound for ABOF to efficiently find the top outliers. Experimental results show that ABOD provides better precision and recall in ranking outliers, especially in high-dimensional data, while maintaining efficiency in terms of runtime.This paper introduces a novel approach to outlier detection in high-dimensional data, named Angle-Based Outlier Detection (ABOD). ABOD assesses the variance in the angles between difference vectors of a point to other points, which helps to mitigate the "curse of dimensionality" compared to purely distance-based methods. The main advantage of ABOD is that it does not rely on parameter selection, unlike other methods. The authors compare ABOD to the well-established distance-based method LOF using both artificial and real-world datasets, demonstrating that ABOD performs particularly well on high-dimensional data. The paper also discusses two variants of ABOD: FastABOD, which uses a sample of the database to approximate the angle-based outlier factor (ABOF), and LB-ABOD, a filter-refinement approach that uses a lower bound for ABOF to efficiently find the top outliers. Experimental results show that ABOD provides better precision and recall in ranking outliers, especially in high-dimensional data, while maintaining efficiency in terms of runtime.