The paper evaluates the performance of state-of-the-art methods for combining heterogeneous classifiers using stacking, finding that these methods perform at best comparably to selecting the best classifier from the ensemble by cross-validation. Among the stacking methods, those using probability distributions and multi-response linear regression (MLR) show the best performance. The authors propose two extensions of this method: one using an extended set of meta-level features and another using multi-response model trees for meta-level learning. The experimental results show that the latter extension outperforms existing stacking approaches and selecting the best classifier by cross-validation. The study also highlights the importance of the number of base-level classifiers and the choice of meta-level algorithms in the performance of stacking methods.The paper evaluates the performance of state-of-the-art methods for combining heterogeneous classifiers using stacking, finding that these methods perform at best comparably to selecting the best classifier from the ensemble by cross-validation. Among the stacking methods, those using probability distributions and multi-response linear regression (MLR) show the best performance. The authors propose two extensions of this method: one using an extended set of meta-level features and another using multi-response model trees for meta-level learning. The experimental results show that the latter extension outperforms existing stacking approaches and selecting the best classifier by cross-validation. The study also highlights the importance of the number of base-level classifiers and the choice of meta-level algorithms in the performance of stacking methods.