2005 | Gavin Brown, Jeremy Wyatt, Rachel Harris, Xin Yao
The paper "Diversity Creation Methods: A Survey and Categorisation" by Gavin Brown, Jeremy Wyatt, Rachel Harris, and Xin Yao from the University of Birmingham explores the concept of error diversity in ensemble methods for classification and regression tasks. The authors review various attempts to formally explain error diversity, including heuristic and qualitative explanations, and discuss the literature on both regression and classification problems. They introduce the idea of implicit and explicit diversity creation methods and propose a preliminary taxonomy of diversity creation techniques. The paper also highlights the importance of balancing diversity with individual accuracy to achieve the lowest overall ensemble error. In the regression context, the Ambiguity decomposition and bias-variance-covariance decomposition are used to quantify the impact of diversity on ensemble performance. For classification, the authors note the lack of a clear analogue to the regression decompositions due to the non-ordinal nature of class labels and the use of majority voting. They review several empirical measures of classification error diversity, such as Sharkey's heuristic metric and Kuncheva's Q-statistic, and discuss the challenges in defining a definitive link between diversity and ensemble accuracy. The paper concludes with a discussion on the dimensions along which diversity creation methods can be applied, including starting points in hypothesis space, set of accessible hypotheses, and traversal of hypothesis space.The paper "Diversity Creation Methods: A Survey and Categorisation" by Gavin Brown, Jeremy Wyatt, Rachel Harris, and Xin Yao from the University of Birmingham explores the concept of error diversity in ensemble methods for classification and regression tasks. The authors review various attempts to formally explain error diversity, including heuristic and qualitative explanations, and discuss the literature on both regression and classification problems. They introduce the idea of implicit and explicit diversity creation methods and propose a preliminary taxonomy of diversity creation techniques. The paper also highlights the importance of balancing diversity with individual accuracy to achieve the lowest overall ensemble error. In the regression context, the Ambiguity decomposition and bias-variance-covariance decomposition are used to quantify the impact of diversity on ensemble performance. For classification, the authors note the lack of a clear analogue to the regression decompositions due to the non-ordinal nature of class labels and the use of majority voting. They review several empirical measures of classification error diversity, such as Sharkey's heuristic metric and Kuncheva's Q-statistic, and discuss the challenges in defining a definitive link between diversity and ensemble accuracy. The paper concludes with a discussion on the dimensions along which diversity creation methods can be applied, including starting points in hypothesis space, set of accessible hypotheses, and traversal of hypothesis space.