This paper presents a Classification EM (CEM) algorithm for clustering and two stochastic versions: the Stochastic EM (SEM) algorithm and the Simulated Annealing EM (CAEM) algorithm. The CEM algorithm is a generalization of the EM algorithm for mixture models, designed to optimize the Classification Maximum Likelihood (CML) criterion. The CML criterion is a generalization of the standard k-means algorithm, which is a particular case of the CML criterion for a Gaussian mixture with equal proportions and a common covariance matrix. The CEM algorithm is shown to be a natural extension of the EM algorithm, with a classification step between the E-step and M-step of the EM algorithm. The CEM algorithm is proven to increase the CML criterion and converge to a stationary value.
The SEM algorithm is a stochastic version of the CEM algorithm, incorporating random perturbations to reduce the initial-position dependence of the classical optimization clustering algorithms. The CAEM algorithm is a simulated annealing version of the CEM algorithm, which is designed to find a stable fixed point of the CML criterion by taking advantage of the basic properties of the CEM algorithm. The CAEM algorithm is shown to be more stable and gives better results, especially for small samples.
Numerical experiments are performed to compare the performance of the CEM, SEM, and CAEM algorithms on simulated and real data sets. The results show that the CEM algorithm performs well for clear clustering structures and large samples, while the SEM and CAEM algorithms outperform the CEM algorithm in other situations. The CAEM algorithm is more stable and gives better results, especially for small samples. The results also show that the CAEM algorithm is more reliable than the SEM algorithm for very small data sets. The experiments also show that the CAEM algorithm can improve a poor sub-optimal solution provided by the CEM algorithm.
The paper concludes that the CEM algorithm is a general clustering algorithm that can be used for a wide range of clustering criteria. The SEM and CAEM algorithms are stochastic versions of the CEM algorithm that are designed to reduce the initial-position dependence of the classical optimization clustering algorithms. The CAEM algorithm is recommended for small sample sizes, while the SEM algorithm is recommended for large sample sizes, especially when there is no apparent clustering structure from the data.This paper presents a Classification EM (CEM) algorithm for clustering and two stochastic versions: the Stochastic EM (SEM) algorithm and the Simulated Annealing EM (CAEM) algorithm. The CEM algorithm is a generalization of the EM algorithm for mixture models, designed to optimize the Classification Maximum Likelihood (CML) criterion. The CML criterion is a generalization of the standard k-means algorithm, which is a particular case of the CML criterion for a Gaussian mixture with equal proportions and a common covariance matrix. The CEM algorithm is shown to be a natural extension of the EM algorithm, with a classification step between the E-step and M-step of the EM algorithm. The CEM algorithm is proven to increase the CML criterion and converge to a stationary value.
The SEM algorithm is a stochastic version of the CEM algorithm, incorporating random perturbations to reduce the initial-position dependence of the classical optimization clustering algorithms. The CAEM algorithm is a simulated annealing version of the CEM algorithm, which is designed to find a stable fixed point of the CML criterion by taking advantage of the basic properties of the CEM algorithm. The CAEM algorithm is shown to be more stable and gives better results, especially for small samples.
Numerical experiments are performed to compare the performance of the CEM, SEM, and CAEM algorithms on simulated and real data sets. The results show that the CEM algorithm performs well for clear clustering structures and large samples, while the SEM and CAEM algorithms outperform the CEM algorithm in other situations. The CAEM algorithm is more stable and gives better results, especially for small samples. The results also show that the CAEM algorithm is more reliable than the SEM algorithm for very small data sets. The experiments also show that the CAEM algorithm can improve a poor sub-optimal solution provided by the CEM algorithm.
The paper concludes that the CEM algorithm is a general clustering algorithm that can be used for a wide range of clustering criteria. The SEM and CAEM algorithms are stochastic versions of the CEM algorithm that are designed to reduce the initial-position dependence of the classical optimization clustering algorithms. The CAEM algorithm is recommended for small sample sizes, while the SEM algorithm is recommended for large sample sizes, especially when there is no apparent clustering structure from the data.