Understanding On Convergence Properties of the EM Algorithm for Gaussian Mixtures

This paper explores the mathematical connection between the Expectation-Maximization (EM) algorithm and gradient-based approaches for maximum likelihood estimation of finite Gaussian mixtures. The authors show that the EM step in parameter space can be derived from the gradient through a projection matrix \( P \), and provide explicit expressions for this matrix. They analyze the convergence of EM in terms of the properties of \( P \) and its effect on the likelihood surface. The paper also presents empirical results suggesting that EM regularizes the condition number of the effective Hessian, leading to faster convergence in certain cases. The authors argue that EM has several advantages, including its natural handling of probabilistic constraints, guaranteed convergence, and low computational overhead. They compare EM with other optimization methods, highlighting its strengths and weaknesses, and provide theoretical and empirical evidence that EM can approximate superlinear methods under appropriate conditions. The paper concludes by discussing the role of EM in the development of learning systems and its importance in predictive data modeling.This paper explores the mathematical connection between the Expectation-Maximization (EM) algorithm and gradient-based approaches for maximum likelihood estimation of finite Gaussian mixtures. The authors show that the EM step in parameter space can be derived from the gradient through a projection matrix \( P \), and provide explicit expressions for this matrix. They analyze the convergence of EM in terms of the properties of \( P \) and its effect on the likelihood surface. The paper also presents empirical results suggesting that EM regularizes the condition number of the effective Hessian, leading to faster convergence in certain cases. The authors argue that EM has several advantages, including its natural handling of probabilistic constraints, guaranteed convergence, and low computational overhead. They compare EM with other optimization methods, highlighting its strengths and weaknesses, and provide theoretical and empirical evidence that EM can approximate superlinear methods under appropriate conditions. The paper concludes by discussing the role of EM in the development of learning systems and its importance in predictive data modeling.

On Convergence Properties of the EM Algorithm for Gaussian Mixtures

January 17, 1995 | Lei Xu and Michael I. Jordan