2000 | Markus M. Breunig†, Hans-Peter Kriegel†, Raymond T. Ng†, Jörg Sander†
The paper introduces a novel method for identifying local outliers in multidimensional datasets, termed Local Outlier Factor (LOF). Unlike traditional binary outlier detection methods, LOF assigns a degree of outlierness to each object, making it more suitable for applications where rare events or deviations from the majority are of interest. The key idea is to measure the isolation of an object by comparing its reachability distance to those of its nearest neighbors. The authors provide a detailed formal analysis of LOF, showing that for most objects within a cluster, their LOF is approximately 1, indicating they are not outliers. For other objects, they establish lower and upper bounds on LOF, highlighting the local nature of the measure. The impact of the parameter MinPts, which defines the neighborhood size, is also analyzed, and guidelines are provided for choosing appropriate MinPts values. Experimental results on synthetic and real-world datasets demonstrate the effectiveness and practicality of LOF in identifying meaningful outliers, even in high-dimensional spaces.The paper introduces a novel method for identifying local outliers in multidimensional datasets, termed Local Outlier Factor (LOF). Unlike traditional binary outlier detection methods, LOF assigns a degree of outlierness to each object, making it more suitable for applications where rare events or deviations from the majority are of interest. The key idea is to measure the isolation of an object by comparing its reachability distance to those of its nearest neighbors. The authors provide a detailed formal analysis of LOF, showing that for most objects within a cluster, their LOF is approximately 1, indicating they are not outliers. For other objects, they establish lower and upper bounds on LOF, highlighting the local nature of the measure. The impact of the parameter MinPts, which defines the neighborhood size, is also analyzed, and guidelines are provided for choosing appropriate MinPts values. Experimental results on synthetic and real-world datasets demonstrate the effectiveness and practicality of LOF in identifying meaningful outliers, even in high-dimensional spaces.