Random k-Labelsets: An Ensemble Method for Multilabel Classification

Random k-Labelsets: An Ensemble Method for Multilabel Classification

2007 | Grigorios Tsoumakas and Ioannis Vlahavas
This paper proposes RAKEL, an ensemble method for multilabel classification. RAKEL constructs an ensemble of Label Powerset (LP) classifiers by randomly selecting small subsets of labels and training a single-label classifier for each subset. Each classifier predicts labels in the subset, and the ensemble combines predictions using thresholding. The method aims to capture label correlations while avoiding the high computational cost of LP, which considers all possible label subsets. RAKEL is evaluated on three domains: protein function classification, scene classification, and document categorization. Experimental results show that RAKEL outperforms the Binary Relevance (BR) and LP methods in terms of Hamming loss and F-measure. The method is effective for various subset sizes and thresholds, with optimal performance achieved for certain values of subset size and threshold. The paper also presents a unified evaluation framework for multilabel classification, categorizing measures into example-based and label-based. Example-based measures evaluate overall performance across examples, while label-based measures average performance across labels. The paper discusses micro and macro averaging for label-based measures. RAKEL's computational complexity depends on the base classifier and the number of models. The method is efficient for small subset sizes and a reasonable number of models. The paper also discusses the implementation of RAKEL as a Java package based on Weka, which includes BR, LP, and RAKEL methods, an evaluation framework, and multilabel statistics. The results show that RAKEL achieves better performance than BR and LP for a wide range of subset sizes and thresholds. The method is particularly effective for subset sizes of 3-5 and thresholds between 0.2-0.8. The paper concludes that RAKEL is a promising approach for multilabel classification, and future work includes combining RAKEL with ensemble selection methods to improve performance.This paper proposes RAKEL, an ensemble method for multilabel classification. RAKEL constructs an ensemble of Label Powerset (LP) classifiers by randomly selecting small subsets of labels and training a single-label classifier for each subset. Each classifier predicts labels in the subset, and the ensemble combines predictions using thresholding. The method aims to capture label correlations while avoiding the high computational cost of LP, which considers all possible label subsets. RAKEL is evaluated on three domains: protein function classification, scene classification, and document categorization. Experimental results show that RAKEL outperforms the Binary Relevance (BR) and LP methods in terms of Hamming loss and F-measure. The method is effective for various subset sizes and thresholds, with optimal performance achieved for certain values of subset size and threshold. The paper also presents a unified evaluation framework for multilabel classification, categorizing measures into example-based and label-based. Example-based measures evaluate overall performance across examples, while label-based measures average performance across labels. The paper discusses micro and macro averaging for label-based measures. RAKEL's computational complexity depends on the base classifier and the number of models. The method is efficient for small subset sizes and a reasonable number of models. The paper also discusses the implementation of RAKEL as a Java package based on Weka, which includes BR, LP, and RAKEL methods, an evaluation framework, and multilabel statistics. The results show that RAKEL achieves better performance than BR and LP for a wide range of subset sizes and thresholds. The method is particularly effective for subset sizes of 3-5 and thresholds between 0.2-0.8. The paper concludes that RAKEL is a promising approach for multilabel classification, and future work includes combining RAKEL with ensemble selection methods to improve performance.
Reach us at info@study.space