Understanding On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

This paper investigates the statistical and computational limits of latent Diffusion Transformers (DiTs) under the assumption that the data is supported on an unknown low-dimensional linear subspace. The main contributions are threefold: 1. **Score Approximation**: The authors derive an approximation error bound for the score network of latent DiTs, which is sub-linear in the latent space dimension. This bound is derived under mild data assumptions and provides guidance for the structural configuration of the score network. 2. **Score and Distribution Estimation**: They provide a sample complexity bound for score estimation and show that the learned score estimator can recover the initial data distribution. The results indicate that the learned score estimator is capable of recovering the initial data distribution, including the subspace and the distribution of the on-support latent variable. 3. **Provably Efficient Criteria**: The authors analyze the computational hardness of both forward inference and backward computation of latent DiTs, assuming the Strong Exponential Time Hypothesis (SETH). They identify efficient criteria for all possible latent DiTs inference algorithms and show that almost-linear time inference is possible. For backward computation, they leverage the low-rank structure within the gradient computation to achieve almost-linear time training. The paper highlights that the statistical and computational results are dominated by the subspace dimension under the low-dimensional assumption, suggesting that latent DiTs can potentially bypass the challenges associated with high-dimensional data by transforming it into a low-dimensional latent space.This paper investigates the statistical and computational limits of latent Diffusion Transformers (DiTs) under the assumption that the data is supported on an unknown low-dimensional linear subspace. The main contributions are threefold: 1. **Score Approximation**: The authors derive an approximation error bound for the score network of latent DiTs, which is sub-linear in the latent space dimension. This bound is derived under mild data assumptions and provides guidance for the structural configuration of the score network. 2. **Score and Distribution Estimation**: They provide a sample complexity bound for score estimation and show that the learned score estimator can recover the initial data distribution. The results indicate that the learned score estimator is capable of recovering the initial data distribution, including the subspace and the distribution of the on-support latent variable. 3. **Provably Efficient Criteria**: The authors analyze the computational hardness of both forward inference and backward computation of latent DiTs, assuming the Strong Exponential Time Hypothesis (SETH). They identify efficient criteria for all possible latent DiTs inference algorithms and show that almost-linear time inference is possible. For backward computation, they leverage the low-rank structure within the gradient computation to achieve almost-linear time training. The paper highlights that the statistical and computational results are dominated by the subspace dimension under the low-dimensional assumption, suggesting that latent DiTs can potentially bypass the challenges associated with high-dimensional data by transforming it into a low-dimensional latent space.

On Statistical Rates and Provably Efficient Criteria of Latent Diffusion Transformers (DiTs)

Last Update: August 23, 2024 | Jerry Yao-Chieh Hu†*, Weimin Wu†*2, Zhao Song‡3, Han Liu§4

Last Update: August 23, 2024 | Jerry Yao-Chieh Hu†, Weimin Wu†2, Zhao Song‡3, Han Liu§4