LOF: Identifying Density-Based Local Outliers

LOF: Identifying Density-Based Local Outliers

2000/05 | Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, Jörg Sander
This paper introduces the Local Outlier Factor (LOF) as a method for identifying local outliers in multidimensional datasets. LOF assigns a degree of outlier-ness to each object, reflecting how isolated it is within its local neighborhood. Unlike traditional binary outlier detection, LOF provides a continuous measure of outlier-ness, allowing for more nuanced identification of outliers. The LOF is computed based on the local reachability density of an object, which is derived from the reachability distances to its nearest neighbors. The key idea is that objects in dense regions of a dataset have a LOF close to 1, while objects in sparse regions have a higher LOF, indicating they are more likely to be outliers. The paper analyzes the formal properties of LOF, showing that it is local in nature and that its bounds are tight for certain classes of objects. It also discusses the impact of the MinPts parameter, which determines the number of nearest neighbors considered in the computation of LOF. The paper provides guidelines for selecting an appropriate MinPts value, emphasizing the importance of choosing a range of MinPts values to account for the non-monotonic behavior of LOF. Experiments are conducted on various datasets, including synthetic and real-world data, to evaluate the effectiveness of LOF in identifying meaningful outliers. The results demonstrate that LOF can effectively detect outliers that other methods may miss, and that it is computationally efficient. The paper concludes that LOF provides a practical and effective approach for identifying local outliers in large, high-dimensional datasets.This paper introduces the Local Outlier Factor (LOF) as a method for identifying local outliers in multidimensional datasets. LOF assigns a degree of outlier-ness to each object, reflecting how isolated it is within its local neighborhood. Unlike traditional binary outlier detection, LOF provides a continuous measure of outlier-ness, allowing for more nuanced identification of outliers. The LOF is computed based on the local reachability density of an object, which is derived from the reachability distances to its nearest neighbors. The key idea is that objects in dense regions of a dataset have a LOF close to 1, while objects in sparse regions have a higher LOF, indicating they are more likely to be outliers. The paper analyzes the formal properties of LOF, showing that it is local in nature and that its bounds are tight for certain classes of objects. It also discusses the impact of the MinPts parameter, which determines the number of nearest neighbors considered in the computation of LOF. The paper provides guidelines for selecting an appropriate MinPts value, emphasizing the importance of choosing a range of MinPts values to account for the non-monotonic behavior of LOF. Experiments are conducted on various datasets, including synthetic and real-world data, to evaluate the effectiveness of LOF in identifying meaningful outliers. The results demonstrate that LOF can effectively detect outliers that other methods may miss, and that it is computationally efficient. The paper concludes that LOF provides a practical and effective approach for identifying local outliers in large, high-dimensional datasets.
Reach us at info@study.space