16 Jan 2024 | Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden†, Saining Xie†
The paper introduces Scalable Interpolant Transformers (SiT), a family of generative models built on the Diffusion Transformers (DiT) backbone. SiT introduces an interpolant framework that allows for more flexible connections between distributions compared to standard diffusion models, enabling a modular study of various design choices. These choices include discrete vs. continuous time learning, model prediction, interpolant selection, and deterministic vs. stochastic sampling. By carefully exploring these components, SiT outperforms DiT across different model sizes on the ImageNet 256x256 benchmark, achieving an FID-50K score of 2.06 by tuning diffusion coefficients independently from learning. The paper also discusses the benefits of continuous-time models, velocity models, and different interpolants, providing insights into the performance gains and trade-offs in generative modeling.The paper introduces Scalable Interpolant Transformers (SiT), a family of generative models built on the Diffusion Transformers (DiT) backbone. SiT introduces an interpolant framework that allows for more flexible connections between distributions compared to standard diffusion models, enabling a modular study of various design choices. These choices include discrete vs. continuous time learning, model prediction, interpolant selection, and deterministic vs. stochastic sampling. By carefully exploring these components, SiT outperforms DiT across different model sizes on the ImageNet 256x256 benchmark, achieving an FID-50K score of 2.06 by tuning diffusion coefficients independently from learning. The paper also discusses the benefits of continuous-time models, velocity models, and different interpolants, providing insights into the performance gains and trade-offs in generative modeling.