Learning general Gaussian mixtures with efficient score matching

Learning general Gaussian mixtures with efficient score matching

April 29, 2024 | Sitan Chen, Vasilis Kontonis, Kulin Shah
We study the problem of learning mixtures of k Gaussians in d dimensions without assuming separation between components. We require that the covariance matrices have bounded condition number and that the means and covariances lie in a bounded radius ball. Our algorithm draws $ d^{\mathrm{poly}(k/\varepsilon)} $ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler whose output distribution is $ \varepsilon $-far from the unknown mixture in total variation. Prior works either required exponential runtime in d, placed strong assumptions on the instance, or had doubly exponential dependence on k. Our approach uses a reduction from distribution learning to score matching, leveraging a structural result showing that the score function of a Gaussian mixture can be approximated by a piecewise-polynomial function. This is the first example of diffusion models achieving a state-of-the-art theoretical guarantee for an unsupervised learning task. We provide an efficient algorithm for score matching for GMMs, which relies on a novel structural result showing that the score function can be well-approximated by a piecewise polynomial. Our techniques combine modern algorithmic techniques with classic ideas from theoretical computer science. We achieve a runtime of $ d^{\mathrm{poly}(k)} $ for any constant accuracy $ \varepsilon > 0 $, improving upon the doubly exponential runtime of moment-based methods. We also show that our algorithm can be used to learn mixtures of degenerate Gaussians in Wasserstein distance. Our results are incomparable to previous work on diffusion models, as we consider Gaussian mixtures with arbitrary well-conditioned covariances, but our runtime scales exponentially in poly(k/ε). We leave as an open question whether we can remove the condition number assumption.We study the problem of learning mixtures of k Gaussians in d dimensions without assuming separation between components. We require that the covariance matrices have bounded condition number and that the means and covariances lie in a bounded radius ball. Our algorithm draws $ d^{\mathrm{poly}(k/\varepsilon)} $ samples from the target mixture, runs in sample-polynomial time, and constructs a sampler whose output distribution is $ \varepsilon $-far from the unknown mixture in total variation. Prior works either required exponential runtime in d, placed strong assumptions on the instance, or had doubly exponential dependence on k. Our approach uses a reduction from distribution learning to score matching, leveraging a structural result showing that the score function of a Gaussian mixture can be approximated by a piecewise-polynomial function. This is the first example of diffusion models achieving a state-of-the-art theoretical guarantee for an unsupervised learning task. We provide an efficient algorithm for score matching for GMMs, which relies on a novel structural result showing that the score function can be well-approximated by a piecewise polynomial. Our techniques combine modern algorithmic techniques with classic ideas from theoretical computer science. We achieve a runtime of $ d^{\mathrm{poly}(k)} $ for any constant accuracy $ \varepsilon > 0 $, improving upon the doubly exponential runtime of moment-based methods. We also show that our algorithm can be used to learn mixtures of degenerate Gaussians in Wasserstein distance. Our results are incomparable to previous work on diffusion models, as we consider Gaussian mixtures with arbitrary well-conditioned covariances, but our runtime scales exponentially in poly(k/ε). We leave as an open question whether we can remove the condition number assumption.
Reach us at info@study.space