[slides and audio] Energy-based Automated Model Evaluation

The paper introduces a novel measure called Meta-Distribution Energy (MDE) to enhance the efficiency and effectiveness of Automated Model Evaluation (AutoEval). AutoEval is a method that predicts a model's performance on out-of-distribution (OOD) datasets without ground-truth labels, addressing the limitations of traditional evaluation protocols that rely on labeled, i.i.d. testing datasets. The MDE is designed to overcome issues such as overconfidence, high storage requirements, and computational costs. It establishes a meta-distribution statistic based on the energy of individual samples, which is supported by theoretical theorems connecting MDE to classification loss. Extensive experiments across various modalities, datasets, and architectural backbones demonstrate the superior performance and versatility of MDE. MDE outperforms existing AutoEval methods in terms of correlation with accuracy, mean absolute error, and robustness to label bias and noise. The paper also explores hyperparameter sensitivity and stress tests, showing that MDE remains effective even under strongly noisy and class-imbalanced conditions. Overall, MDE provides a solid paradigm for AutoEval, broadening its applicability in real-world scenarios.The paper introduces a novel measure called Meta-Distribution Energy (MDE) to enhance the efficiency and effectiveness of Automated Model Evaluation (AutoEval). AutoEval is a method that predicts a model's performance on out-of-distribution (OOD) datasets without ground-truth labels, addressing the limitations of traditional evaluation protocols that rely on labeled, i.i.d. testing datasets. The MDE is designed to overcome issues such as overconfidence, high storage requirements, and computational costs. It establishes a meta-distribution statistic based on the energy of individual samples, which is supported by theoretical theorems connecting MDE to classification loss. Extensive experiments across various modalities, datasets, and architectural backbones demonstrate the superior performance and versatility of MDE. MDE outperforms existing AutoEval methods in terms of correlation with accuracy, mean absolute error, and robustness to label bias and noise. The paper also explores hyperparameter sensitivity and stress tests, showing that MDE remains effective even under strongly noisy and class-imbalanced conditions. Overall, MDE provides a solid paradigm for AutoEval, broadening its applicability in real-world scenarios.

ENERGY-BASED AUTOMATED MODEL EVALUATION

15 Mar 2024 | Ru Peng1, Heming Zou1, Haobo Wang1 Yawen Zeng2 Zenan Huang1 Junbo Zhao1†

ENERGY-BASED AUTOMATED MODEL EVALUATION

15 Mar 2024 | Ru Peng1*, Heming Zou1*, Haobo Wang1 Yawen Zeng2 Zenan Huang1 Junbo Zhao1†

15 Mar 2024 | Ru Peng1, Heming Zou1, Haobo Wang1 Yawen Zeng2 Zenan Huang1 Junbo Zhao1†