Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

Beyond ELBOs: A Large-Scale Evaluation of Variational Methods for Sampling

2024 | Denis Blessing, Xiaogang Jia, Johannes Esslinger, Francisco Vargas, Gerhard Neumann
This paper introduces a benchmark for evaluating variational methods for sampling, focusing on quantifying mode collapse. The authors propose a standardized task suite and a range of performance criteria to assess sampling methods. They study existing metrics for mode collapse and introduce new metrics. The findings provide insights into the strengths and weaknesses of current sampling methods, serving as a reference for future developments. The benchmark includes various sampling methods, such as tractable density models, sequential importance sampling methods, and diffusion-based methods. The evaluation is performed on synthetic and real target densities, with performance metrics including 2-Wasserstein distance, maximum mean discrepancy, and evidence bounds. The results show that methods like GMMVI and FAB perform well across different tasks, while diffusion-based methods often struggle with mode collapse. The study highlights the importance of using metrics that are sensitive to mode collapse, such as forward criteria and integral probability metrics, rather than traditional metrics like ELBO. The authors conclude that no single method is superior across all situations, and that the choice of method depends on the specific task and requirements. The work provides a comprehensive evaluation of sampling methods and contributes to the development of more effective techniques.This paper introduces a benchmark for evaluating variational methods for sampling, focusing on quantifying mode collapse. The authors propose a standardized task suite and a range of performance criteria to assess sampling methods. They study existing metrics for mode collapse and introduce new metrics. The findings provide insights into the strengths and weaknesses of current sampling methods, serving as a reference for future developments. The benchmark includes various sampling methods, such as tractable density models, sequential importance sampling methods, and diffusion-based methods. The evaluation is performed on synthetic and real target densities, with performance metrics including 2-Wasserstein distance, maximum mean discrepancy, and evidence bounds. The results show that methods like GMMVI and FAB perform well across different tasks, while diffusion-based methods often struggle with mode collapse. The study highlights the importance of using metrics that are sensitive to mode collapse, such as forward criteria and integral probability metrics, rather than traditional metrics like ELBO. The authors conclude that no single method is superior across all situations, and that the choice of method depends on the specific task and requirements. The work provides a comprehensive evaluation of sampling methods and contributes to the development of more effective techniques.
Reach us at info@futurestudyspace.com