Support Vector Clustering

Support Vector Clustering

2001 | Asa Ben-Hur, David Horn, Hava T. Siegelmann, Vladimir Vapnik
This paper presents a novel clustering method based on support vector machines (SVMs). The approach maps data points into a high-dimensional feature space using a Gaussian kernel and searches for the minimal enclosing sphere. This sphere, when mapped back to the data space, forms contours that enclose the data points, which are interpreted as cluster boundaries. The width of the Gaussian kernel controls the scale of data probing, while the soft margin constant helps handle outliers and overlapping clusters. By varying these parameters, the algorithm explores the dataset structure, maintaining a minimal number of support vectors to ensure smooth cluster boundaries. The algorithm is demonstrated on several datasets, showing its effectiveness in clustering. The method is non-parametric and does not assume the number or shape of clusters. It can handle outliers by allowing the sphere in feature space to exclude some points. For overlapping clusters, the algorithm is similar to the scale-space clustering method based on a Parzen window estimate of the probability density. The Support Vector Clustering (SVC) algorithm is described in detail. It uses a support vector description of the dataset, finding the smallest enclosing sphere in feature space. The algorithm defines cluster boundaries based on the contours formed by this sphere. Points enclosed by these contours are assigned to the same cluster. The width parameter of the Gaussian kernel controls the number of clusters, with larger values leading to more clusters. The algorithm is applied to examples with and without outliers. In cases without outliers, increasing the kernel width leads to more clusters. In cases with outliers, BSVs (bounded support vectors) are used to allow for smoother cluster boundaries. The algorithm is also effective in handling strongly overlapping clusters by identifying dense cores of the underlying probability distribution. The SVC algorithm is compared to other clustering methods, showing its effectiveness in various scenarios. It is computationally efficient, with a complexity of O(N²d), and can handle high-dimensional data by using PCA to reduce dimensionality. The algorithm is also useful for large datasets due to its low memory requirements and efficient implementation. Overall, the SVC method provides a flexible and effective approach to clustering, capable of handling a wide range of data structures and noise levels.This paper presents a novel clustering method based on support vector machines (SVMs). The approach maps data points into a high-dimensional feature space using a Gaussian kernel and searches for the minimal enclosing sphere. This sphere, when mapped back to the data space, forms contours that enclose the data points, which are interpreted as cluster boundaries. The width of the Gaussian kernel controls the scale of data probing, while the soft margin constant helps handle outliers and overlapping clusters. By varying these parameters, the algorithm explores the dataset structure, maintaining a minimal number of support vectors to ensure smooth cluster boundaries. The algorithm is demonstrated on several datasets, showing its effectiveness in clustering. The method is non-parametric and does not assume the number or shape of clusters. It can handle outliers by allowing the sphere in feature space to exclude some points. For overlapping clusters, the algorithm is similar to the scale-space clustering method based on a Parzen window estimate of the probability density. The Support Vector Clustering (SVC) algorithm is described in detail. It uses a support vector description of the dataset, finding the smallest enclosing sphere in feature space. The algorithm defines cluster boundaries based on the contours formed by this sphere. Points enclosed by these contours are assigned to the same cluster. The width parameter of the Gaussian kernel controls the number of clusters, with larger values leading to more clusters. The algorithm is applied to examples with and without outliers. In cases without outliers, increasing the kernel width leads to more clusters. In cases with outliers, BSVs (bounded support vectors) are used to allow for smoother cluster boundaries. The algorithm is also effective in handling strongly overlapping clusters by identifying dense cores of the underlying probability distribution. The SVC algorithm is compared to other clustering methods, showing its effectiveness in various scenarios. It is computationally efficient, with a complexity of O(N²d), and can handle high-dimensional data by using PCA to reduce dimensionality. The algorithm is also useful for large datasets due to its low memory requirements and efficient implementation. Overall, the SVC method provides a flexible and effective approach to clustering, capable of handling a wide range of data structures and noise levels.
Reach us at info@study.space