April 29, 2024 | Sitan Chen, Vasilis Kontonis, Kulin Shah
This paper addresses the problem of learning mixtures of \( k \) Gaussians in \( d \) dimensions without assuming any separation between the components. The authors propose an algorithm that draws \( d^{\text{poly}(k/\epsilon)} \) samples, runs in sample-polynomial time, and constructs a sampler whose output distribution is \( \epsilon \)-far from the unknown mixture in total variation distance. This improves upon prior work, which either required exponential runtime in \( d \), placed strong assumptions on the instance, or had a doubly exponential dependence on \( k \).
The key contribution is an algorithm for score matching, which leverages a reduction from distribution learning to a supervised learning task called score matching. The algorithm approximates the score function of a Gaussian mixture using a piecewise-polynomial function and provides an efficient procedure to recover the polynomial pieces. This is the first example of diffusion models achieving state-of-the-art theoretical guarantees for an unsupervised learning task.
The approach involves several steps:
1. **Learning via DDPM**: The algorithm uses a denoising diffusion probabilistic model (DDPM) to generate samples from the target distribution.
2. **Approximating the Score Function**: The score function of the mixture is approximated using a piecewise polynomial function, which is shown to be efficient to compute.
3. **Clustering and Polynomial Approximation**: A crude clustering algorithm is used to create a partition of the space, which helps in simplifying the score function and applying polynomial approximation.
4. **Simplifying the Score**: The score function is simplified by removing the contribution of components that are far from the current component, reducing the effective radius of the approximation domain.
5. **Polynomial Approximation**: The simplified score function is approximated using a polynomial, ensuring the degree is polynomial in \( k \) and \( \epsilon \).
The paper also discusses the challenges and limitations, including the exponential dependence on \( 1/\epsilon \) and the need for well-conditioned covariance matrices. The authors provide a detailed analysis and theoretical guarantees for their approach, making significant progress towards bridging the gap between the best-known upper and lower bounds for density estimation of general Gaussian mixture models.This paper addresses the problem of learning mixtures of \( k \) Gaussians in \( d \) dimensions without assuming any separation between the components. The authors propose an algorithm that draws \( d^{\text{poly}(k/\epsilon)} \) samples, runs in sample-polynomial time, and constructs a sampler whose output distribution is \( \epsilon \)-far from the unknown mixture in total variation distance. This improves upon prior work, which either required exponential runtime in \( d \), placed strong assumptions on the instance, or had a doubly exponential dependence on \( k \).
The key contribution is an algorithm for score matching, which leverages a reduction from distribution learning to a supervised learning task called score matching. The algorithm approximates the score function of a Gaussian mixture using a piecewise-polynomial function and provides an efficient procedure to recover the polynomial pieces. This is the first example of diffusion models achieving state-of-the-art theoretical guarantees for an unsupervised learning task.
The approach involves several steps:
1. **Learning via DDPM**: The algorithm uses a denoising diffusion probabilistic model (DDPM) to generate samples from the target distribution.
2. **Approximating the Score Function**: The score function of the mixture is approximated using a piecewise polynomial function, which is shown to be efficient to compute.
3. **Clustering and Polynomial Approximation**: A crude clustering algorithm is used to create a partition of the space, which helps in simplifying the score function and applying polynomial approximation.
4. **Simplifying the Score**: The score function is simplified by removing the contribution of components that are far from the current component, reducing the effective radius of the approximation domain.
5. **Polynomial Approximation**: The simplified score function is approximated using a polynomial, ensuring the degree is polynomial in \( k \) and \( \epsilon \).
The paper also discusses the challenges and limitations, including the exponential dependence on \( 1/\epsilon \) and the need for well-conditioned covariance matrices. The authors provide a detailed analysis and theoretical guarantees for their approach, making significant progress towards bridging the gap between the best-known upper and lower bounds for density estimation of general Gaussian mixture models.