Analyzing the Effectiveness and Applicability of Co-training

Analyzing the Effectiveness and Applicability of Co-training

2000 | Kamal Nigam, Rayid Ghani
This paper investigates the effectiveness and applicability of co-training algorithms in text classification tasks. Co-training is a supervised learning method that combines labeled and unlabeled data, particularly effective when features naturally split into two disjoint sets. The paper demonstrates that co-training algorithms outperform other methods, such as Expectation-Maximization (EM), when a natural feature split exists. However, when no natural split is available, co-training algorithms that artificially create splits can still outperform non-split methods. The paper compares co-training with EM on various datasets, including the WebKB-Course dataset and the News 2x2 dataset. On the WebKB-Course dataset, co-training does not outperform EM, suggesting that the feature split may not be sufficient for co-training to perform well. However, on the News 2x2 dataset, which has true class-conditional independence between feature sets, co-training significantly outperforms EM, indicating that co-training benefits from feature splits. The paper also explores hybrid algorithms that combine co-training and EM, such as co-EM and self-training. These algorithms show improved performance compared to traditional EM and co-training methods. The results suggest that co-training algorithms can benefit from feature splits, and that their performance is sensitive to the validity of these splits. The paper concludes that co-training algorithms can outperform other methods when there is an independent and redundant feature split. However, co-training is sensitive to the assumptions of feature independence and compatibility. The paper also discusses the potential for applying co-training to regular datasets by creating artificial feature splits, and suggests that future research should focus on improving the robustness of co-training algorithms to violations of their assumptions.This paper investigates the effectiveness and applicability of co-training algorithms in text classification tasks. Co-training is a supervised learning method that combines labeled and unlabeled data, particularly effective when features naturally split into two disjoint sets. The paper demonstrates that co-training algorithms outperform other methods, such as Expectation-Maximization (EM), when a natural feature split exists. However, when no natural split is available, co-training algorithms that artificially create splits can still outperform non-split methods. The paper compares co-training with EM on various datasets, including the WebKB-Course dataset and the News 2x2 dataset. On the WebKB-Course dataset, co-training does not outperform EM, suggesting that the feature split may not be sufficient for co-training to perform well. However, on the News 2x2 dataset, which has true class-conditional independence between feature sets, co-training significantly outperforms EM, indicating that co-training benefits from feature splits. The paper also explores hybrid algorithms that combine co-training and EM, such as co-EM and self-training. These algorithms show improved performance compared to traditional EM and co-training methods. The results suggest that co-training algorithms can benefit from feature splits, and that their performance is sensitive to the validity of these splits. The paper concludes that co-training algorithms can outperform other methods when there is an independent and redundant feature split. However, co-training is sensitive to the assumptions of feature independence and compatibility. The paper also discusses the potential for applying co-training to regular datasets by creating artificial feature splits, and suggests that future research should focus on improving the robustness of co-training algorithms to violations of their assumptions.
Reach us at info@study.space
[slides and audio] Analyzing the effectiveness and applicability of co-training