The Evidence Framework Applied to Classification Networks

The Evidence Framework Applied to Classification Networks

1992 | David J. C. MacKay
This paper presents three Bayesian ideas for supervised adaptive classifiers. First, it argues that the output of a classifier should be obtained by marginalizing over the posterior distribution of the parameters, and proposes a simple approximation to this integral, which involves "moderating" the most probable classifier's outputs, leading to improved performance. Second, it demonstrates that the Bayesian framework for model comparison, originally developed for regression models, can also be applied to classification problems, successfully choosing the magnitude of weight decay terms and ranking solutions based on the number of hidden units. Third, an information-based data selection criterion is derived and demonstrated within this framework. The paper discusses the use of Bayesian methods for classification tasks, emphasizing the importance of considering uncertainty in parameter estimates when making predictions. It introduces the concept of "moderated outputs," which take into account the uncertainty in the parameters and provide a more accurate representation of the posterior distribution of the classifier. This approach is shown to improve prediction accuracy, especially in underdetermined networks. The paper also addresses the evaluation of evidence, which is used for model comparison and to assess the generalization ability of classifiers. It shows that the evidence is well correlated with generalization ability, but the quadratic approximation used for evidence evaluation may not always be accurate. The paper further explores active learning, proposing an objective function based on "mean marginal information" to guide the selection of informative data points. This criterion is shown to be effective in identifying regions where the decision boundary is uncertain, leading to better predictions. The paper concludes that the Bayesian framework provides a robust approach for classification tasks, with applications in areas such as hidden Markov models for speech recognition. It also highlights the importance of considering the relationship between the model and the real world, and suggests that the mean marginal information gain is most useful for models well matched to the real world. The paper emphasizes the need for further research into the theoretical relationship between evidence and generalization ability, as well as the scalability of these methods to larger problems.This paper presents three Bayesian ideas for supervised adaptive classifiers. First, it argues that the output of a classifier should be obtained by marginalizing over the posterior distribution of the parameters, and proposes a simple approximation to this integral, which involves "moderating" the most probable classifier's outputs, leading to improved performance. Second, it demonstrates that the Bayesian framework for model comparison, originally developed for regression models, can also be applied to classification problems, successfully choosing the magnitude of weight decay terms and ranking solutions based on the number of hidden units. Third, an information-based data selection criterion is derived and demonstrated within this framework. The paper discusses the use of Bayesian methods for classification tasks, emphasizing the importance of considering uncertainty in parameter estimates when making predictions. It introduces the concept of "moderated outputs," which take into account the uncertainty in the parameters and provide a more accurate representation of the posterior distribution of the classifier. This approach is shown to improve prediction accuracy, especially in underdetermined networks. The paper also addresses the evaluation of evidence, which is used for model comparison and to assess the generalization ability of classifiers. It shows that the evidence is well correlated with generalization ability, but the quadratic approximation used for evidence evaluation may not always be accurate. The paper further explores active learning, proposing an objective function based on "mean marginal information" to guide the selection of informative data points. This criterion is shown to be effective in identifying regions where the decision boundary is uncertain, leading to better predictions. The paper concludes that the Bayesian framework provides a robust approach for classification tasks, with applications in areas such as hidden Markov models for speech recognition. It also highlights the importance of considering the relationship between the model and the real world, and suggests that the mean marginal information gain is most useful for models well matched to the real world. The paper emphasizes the need for further research into the theoretical relationship between evidence and generalization ability, as well as the scalability of these methods to larger problems.
Reach us at info@study.space