ENERGY-BASED AUTOMATED MODEL EVALUATION

ENERGY-BASED AUTOMATED MODEL EVALUATION

2024 | Ru Peng, Heming Zou, Haobo Wang, Yawen Zeng, Zenan Huang, Junbo Zhao
This paper introduces a novel measure called Meta-Distribution Energy (MDE) to enhance the efficiency and effectiveness of the Automated Model Evaluation (AutoEval) framework. The traditional evaluation methods rely on labeled, i.i.d. datasets, which are not always available in real-world applications. AutoEval offers an alternative by estimating model performance without ground-truth labels. However, existing AutoEval methods suffer from overconfidence and high computational costs. To address these issues, the authors propose MDE, which is a meta-distribution statistic based on the energy of individual samples. MDE is derived from energy-based learning and provides a smoother representation of the dataset's distribution compared to the initial energy score. The authors also provide theoretical insights by connecting MDE to classification loss, showing that MDE consistently correlates with negative log-likelihood loss, thus reflecting the model's generalization performance. The MDE is evaluated across various modalities, datasets, and architectural backbones, demonstrating its effectiveness and versatility. The results show that MDE outperforms prior methods in terms of correlation with accuracy and mean absolute error. Additionally, MDE is shown to be effective in scenarios with noisy or imbalanced labels and can be seamlessly integrated with large-scale models. The authors also provide extensive experiments, including stress tests on strongly noisy and class-imbalanced datasets, which further validate the robustness of MDE. Overall, MDE offers a more efficient and effective AutoEval framework, with the potential to be applied in a wide range of real-world scenarios.This paper introduces a novel measure called Meta-Distribution Energy (MDE) to enhance the efficiency and effectiveness of the Automated Model Evaluation (AutoEval) framework. The traditional evaluation methods rely on labeled, i.i.d. datasets, which are not always available in real-world applications. AutoEval offers an alternative by estimating model performance without ground-truth labels. However, existing AutoEval methods suffer from overconfidence and high computational costs. To address these issues, the authors propose MDE, which is a meta-distribution statistic based on the energy of individual samples. MDE is derived from energy-based learning and provides a smoother representation of the dataset's distribution compared to the initial energy score. The authors also provide theoretical insights by connecting MDE to classification loss, showing that MDE consistently correlates with negative log-likelihood loss, thus reflecting the model's generalization performance. The MDE is evaluated across various modalities, datasets, and architectural backbones, demonstrating its effectiveness and versatility. The results show that MDE outperforms prior methods in terms of correlation with accuracy and mean absolute error. Additionally, MDE is shown to be effective in scenarios with noisy or imbalanced labels and can be seamlessly integrated with large-scale models. The authors also provide extensive experiments, including stress tests on strongly noisy and class-imbalanced datasets, which further validate the robustness of MDE. Overall, MDE offers a more efficient and effective AutoEval framework, with the potential to be applied in a wide range of real-world scenarios.
Reach us at info@study.space