2000 | Edwin M. Knorr¹, Raymond T. Ng¹, Vladimir Tucakov²
This paper presents distance-based (DB) outlier detection methods for large, multidimensional datasets. The authors show that DB outlier detection can be efficiently applied to large datasets and high-dimensional data (e.g., k ≥ 5). They propose two algorithms with complexity O(kN²) and O(c^k + N), where c is a small constant. The latter is suitable for small k. Experimental results show that the latter algorithm outperforms the former for k ≤ 4. For disk-resident datasets, a cell-based algorithm with at most three passes over the data is presented, which is the best for k ≤ 4. The paper discusses three real-life applications, including spatio-temporal data (e.g., video surveillance), confirming the relevance and broad applicability of DB outliers.
Outlier detection is a meaningful and important knowledge discovery task. Unlike traditional statistical methods, DB outliers are based on distance measures and are suitable for multidimensional data. The paper also discusses related work, including depth-based methods and clustering algorithms, which are not suitable for high-dimensional data. The authors compare their DB outlier detection approach with other methods, noting that DB outliers are "intuitively surprising" and can be used in applications such as credit card fraud detection and surveillance. The paper also discusses the use of DB outliers in spatiotemporal data mining, which is the first direct application of outliers to video data mining. The authors conclude that DB outliers are meaningful and have broad applicability in knowledge discovery tasks.This paper presents distance-based (DB) outlier detection methods for large, multidimensional datasets. The authors show that DB outlier detection can be efficiently applied to large datasets and high-dimensional data (e.g., k ≥ 5). They propose two algorithms with complexity O(kN²) and O(c^k + N), where c is a small constant. The latter is suitable for small k. Experimental results show that the latter algorithm outperforms the former for k ≤ 4. For disk-resident datasets, a cell-based algorithm with at most three passes over the data is presented, which is the best for k ≤ 4. The paper discusses three real-life applications, including spatio-temporal data (e.g., video surveillance), confirming the relevance and broad applicability of DB outliers.
Outlier detection is a meaningful and important knowledge discovery task. Unlike traditional statistical methods, DB outliers are based on distance measures and are suitable for multidimensional data. The paper also discusses related work, including depth-based methods and clustering algorithms, which are not suitable for high-dimensional data. The authors compare their DB outlier detection approach with other methods, noting that DB outliers are "intuitively surprising" and can be used in applications such as credit card fraud detection and surveillance. The paper also discusses the use of DB outliers in spatiotemporal data mining, which is the first direct application of outliers to video data mining. The authors conclude that DB outliers are meaningful and have broad applicability in knowledge discovery tasks.