[slides] Fast Outlier Detection in High Dimensional Spaces

The paper introduces a new distance-based outlier detection method that defines outliers as points with the largest sum of distances to their $k$ nearest neighbors, called the *weight*. The algorithm efficiently finds these weights by linearizing the search space using the Hilbert space-filling curve, reducing the number of candidate points with each scan. The algorithm consists of two phases: an initial phase that provides an approximate solution with a low time complexity, and a final phase that returns the exact solution. Experimental results show that the algorithm scales linearly with both the dimensionality and the size of the dataset, and consistently finds the exact solution within a few iterations. The method is particularly effective for high-dimensional data sets and outperforms existing distance-based methods in terms of efficiency and accuracy.The paper introduces a new distance-based outlier detection method that defines outliers as points with the largest sum of distances to their $k$ nearest neighbors, called the *weight*. The algorithm efficiently finds these weights by linearizing the search space using the Hilbert space-filling curve, reducing the number of candidate points with each scan. The algorithm consists of two phases: an initial phase that provides an approximate solution with a low time complexity, and a final phase that returns the exact solution. Experimental results show that the algorithm scales linearly with both the dimensionality and the size of the dataset, and consistently finds the exact solution within a few iterations. The method is particularly effective for high-dimensional data sets and outperforms existing distance-based methods in terms of efficiency and accuracy.

Fast Outlier Detection in High Dimensional Spaces

2002 | Fabrizio Angiulli and Clara Pizzuti