A NOTE ON THE EVALUATION OF GENERATIVE MODELS

A NOTE ON THE EVALUATION OF GENERATIVE MODELS

24 Apr 2016 | Lucas Theis*, Aäron van den Oord*,†, Matthias Bethge
This paper discusses the evaluation of generative models, emphasizing that different metrics such as average log-likelihood, Parzen window estimates, and visual fidelity of samples are largely independent, especially in high-dimensional data. It argues that good performance on one metric does not guarantee good performance on others. The paper highlights that models optimized for one criterion may not perform well on another, and that Parzen window estimates should generally be avoided. It also shows that a simple model based on k-means can outperform the true distribution when evaluated using Parzen window estimates. The paper discusses the trade-offs between different evaluation criteria and emphasizes the importance of evaluating models directly with respect to their intended applications. It also notes that high log-likelihood does not necessarily lead to visually pleasing samples, and that samples can be poor even with high log-likelihood. The paper concludes that there is no one-size-fits-all loss function for generative models and that proper evaluation requires considering the specific application. It also argues that Parzen window estimates should be avoided for evaluating generative models unless the application specifically requires such a loss function.This paper discusses the evaluation of generative models, emphasizing that different metrics such as average log-likelihood, Parzen window estimates, and visual fidelity of samples are largely independent, especially in high-dimensional data. It argues that good performance on one metric does not guarantee good performance on others. The paper highlights that models optimized for one criterion may not perform well on another, and that Parzen window estimates should generally be avoided. It also shows that a simple model based on k-means can outperform the true distribution when evaluated using Parzen window estimates. The paper discusses the trade-offs between different evaluation criteria and emphasizes the importance of evaluating models directly with respect to their intended applications. It also notes that high log-likelihood does not necessarily lead to visually pleasing samples, and that samples can be poor even with high log-likelihood. The paper concludes that there is no one-size-fits-all loss function for generative models and that proper evaluation requires considering the specific application. It also argues that Parzen window estimates should be avoided for evaluating generative models unless the application specifically requires such a loss function.
Reach us at info@study.space
[slides and audio] A note on the evaluation of generative models