Estimating the Support of a High-Dimensional Distribution

Estimating the Support of a High-Dimensional Distribution

27 November 1999; revised 18 September 2000 | Bernhard Schölkopf, John C. Platt, John Shawe-Taylor, Alex J. Smola, Robert C. Williamson
This paper proposes a method for estimating the support of a high-dimensional distribution using a support vector (SV) algorithm adapted for unlabeled data. The method estimates a function f that is positive on the support of the distribution and negative elsewhere. The function is derived using a kernel expansion in terms of a subset of the training data, regularized by controlling the length of the weight vector in a feature space. The algorithm solves a quadratic programming problem to find the expansion coefficients, which are optimized sequentially over pairs of input patterns. Theoretical analysis is provided to characterize the statistical performance of the algorithm. The algorithm is a natural extension of the SV algorithm to the case of unlabeled data. It is applicable even when the density of the data's distribution is not well-defined, such as in the presence of singular components. The method is shown to be effective in both artificial and real-world data, with experiments demonstrating its ability to detect novelty and identify outliers. The algorithm's performance is influenced by the parameter ν, which controls the fraction of outliers and support vectors. Theoretical results show that the algorithm's generalization error can be bounded, and that the choice of ν allows for a trade-off between incorporating outliers and minimizing the size of the estimated region. The paper also discusses the computational efficiency of the algorithm, showing that it scales well with large datasets, especially for small values of ν. The method is applicable in various domains, including the modeling of parameter regimes for the control of walking robots and condition monitoring of jet engines. The algorithm is shown to be effective in practical applications, with results demonstrating its ability to accurately estimate the support of a distribution and detect outliers in real-world data. The paper concludes that the proposed method provides a practical and efficient approach to estimating the support of a high-dimensional distribution, with theoretical guarantees and empirical validation.This paper proposes a method for estimating the support of a high-dimensional distribution using a support vector (SV) algorithm adapted for unlabeled data. The method estimates a function f that is positive on the support of the distribution and negative elsewhere. The function is derived using a kernel expansion in terms of a subset of the training data, regularized by controlling the length of the weight vector in a feature space. The algorithm solves a quadratic programming problem to find the expansion coefficients, which are optimized sequentially over pairs of input patterns. Theoretical analysis is provided to characterize the statistical performance of the algorithm. The algorithm is a natural extension of the SV algorithm to the case of unlabeled data. It is applicable even when the density of the data's distribution is not well-defined, such as in the presence of singular components. The method is shown to be effective in both artificial and real-world data, with experiments demonstrating its ability to detect novelty and identify outliers. The algorithm's performance is influenced by the parameter ν, which controls the fraction of outliers and support vectors. Theoretical results show that the algorithm's generalization error can be bounded, and that the choice of ν allows for a trade-off between incorporating outliers and minimizing the size of the estimated region. The paper also discusses the computational efficiency of the algorithm, showing that it scales well with large datasets, especially for small values of ν. The method is applicable in various domains, including the modeling of parameter regimes for the control of walking robots and condition monitoring of jet engines. The algorithm is shown to be effective in practical applications, with results demonstrating its ability to accurately estimate the support of a distribution and detect outliers in real-world data. The paper concludes that the proposed method provides a practical and efficient approach to estimating the support of a high-dimensional distribution, with theoretical guarantees and empirical validation.
Reach us at info@study.space