Understanding SiT%3A Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

The paper introduces Scalable Interpolant Transformers (SiT), a family of generative models built on the Diffusion Transformers (DiT) backbone. SiT introduces an interpolant framework that allows for more flexible connections between distributions compared to standard diffusion models, enabling a modular study of various design choices. These choices include discrete vs. continuous time learning, model prediction, interpolant selection, and deterministic vs. stochastic sampling. By carefully exploring these components, SiT outperforms DiT across different model sizes on the ImageNet 256x256 benchmark, achieving an FID-50K score of 2.06 by tuning diffusion coefficients independently from learning. The paper also discusses the benefits of continuous-time models, velocity models, and different interpolants, providing insights into the performance gains and trade-offs in generative modeling.The paper introduces Scalable Interpolant Transformers (SiT), a family of generative models built on the Diffusion Transformers (DiT) backbone. SiT introduces an interpolant framework that allows for more flexible connections between distributions compared to standard diffusion models, enabling a modular study of various design choices. These choices include discrete vs. continuous time learning, model prediction, interpolant selection, and deterministic vs. stochastic sampling. By carefully exploring these components, SiT outperforms DiT across different model sizes on the ImageNet 256x256 benchmark, achieving an FID-50K score of 2.06 by tuning diffusion coefficients independently from learning. The paper also discusses the benefits of continuous-time models, velocity models, and different interpolants, providing insights into the performance gains and trade-offs in generative modeling.

SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers

16 Jan 2024 | Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden†, Saining Xie†