Understanding Distance-based outliers%3A algorithms and applications

This paper addresses the challenge of identifying outliers in large, multidimensional datasets, which can lead to significant knowledge discovery in areas such as electronic commerce, credit card fraud, and performance analysis of professional athletes. The authors introduce the concept of *DB (distance-based)* outliers, which are defined as objects that deviate significantly from other objects in the dataset based on a specified fraction $p$ and distance $D$. They present two simple algorithms with a complexity of $O(kN^2)$, where $k$ is the dimensionality and $N$ is the number of objects, and an optimized cell-based algorithm with a complexity of $O(c^k + N)$, where $c$ is a constant. The optimized algorithm is particularly efficient for $k \leq 4$. The paper also discusses three real-life applications, including video surveillance, to demonstrate the broad applicability and meaningfulness of *DB* outliers. The contributions of the paper include efficient outlier detection methods and a detailed exploration of the importance of outlier detection in knowledge discovery.This paper addresses the challenge of identifying outliers in large, multidimensional datasets, which can lead to significant knowledge discovery in areas such as electronic commerce, credit card fraud, and performance analysis of professional athletes. The authors introduce the concept of *DB (distance-based)* outliers, which are defined as objects that deviate significantly from other objects in the dataset based on a specified fraction $p$ and distance $D$. They present two simple algorithms with a complexity of $O(kN^2)$, where $k$ is the dimensionality and $N$ is the number of objects, and an optimized cell-based algorithm with a complexity of $O(c^k + N)$, where $c$ is a constant. The optimized algorithm is particularly efficient for $k \leq 4$. The paper also discusses three real-life applications, including video surveillance, to demonstrate the broad applicability and meaningfulness of *DB* outliers. The contributions of the paper include efficient outlier detection methods and a detailed exploration of the importance of outlier detection in knowledge discovery.

Distance-based outliers: algorithms and applications

Received February 15, 1999 / Accepted August 1, 1999 | Edwin M. Knorr1, Raymond T. Ng1, Vladimir Tucakov2