[slides] A survey on semi-supervised learning

This chapter provides an overview of semi-supervised learning, a branch of machine learning that combines labeled and unlabeled data to improve learning performance. The chapter begins by distinguishing between supervised and unsupervised learning, highlighting the role of semi-supervised learning in leveraging both types of data. It then discusses the importance of semi-supervised learning in scenarios where labeled data is scarce or expensive to obtain, such as in computer-aided diagnosis and drug discovery. The chapter outlines the key assumptions underlying semi-supervised learning, including the smoothness assumption, low-density assumption, and manifold assumption. These assumptions are crucial for inferring label information from unlabeled data. The chapter also explores the connection between semi-supervised learning and clustering, arguing that the cluster assumption generalizes the other assumptions. A taxonomy of semi-supervised learning methods is proposed, distinguishing between inductive and transductive methods. Inductive methods aim to construct a general classification model, while transductive methods focus on making predictions for specific unlabelled data points. The taxonomy is further divided into wrapper methods, unsupervised preprocessing methods, and intrinsically semi-supervised methods. The chapter details various wrapper methods, such as self-training, co-training, and pseudo-labelled boosting methods. Self-training involves iteratively training a supervised classifier on labeled and pseudo-labelled data, while co-training extends this approach to multiple classifiers. Pseudo-labelled boosting methods build a classifier ensemble by sequentially training individual classifiers on labeled and pseudo-labelled data. The chapter concludes with a discussion on the empirical evaluation of semi-supervised learning methods, emphasizing the importance of diverse datasets and strong supervised baselines. It highlights the need for realistic evaluation to assess the effectiveness of semi-supervised learning techniques.This chapter provides an overview of semi-supervised learning, a branch of machine learning that combines labeled and unlabeled data to improve learning performance. The chapter begins by distinguishing between supervised and unsupervised learning, highlighting the role of semi-supervised learning in leveraging both types of data. It then discusses the importance of semi-supervised learning in scenarios where labeled data is scarce or expensive to obtain, such as in computer-aided diagnosis and drug discovery. The chapter outlines the key assumptions underlying semi-supervised learning, including the smoothness assumption, low-density assumption, and manifold assumption. These assumptions are crucial for inferring label information from unlabeled data. The chapter also explores the connection between semi-supervised learning and clustering, arguing that the cluster assumption generalizes the other assumptions. A taxonomy of semi-supervised learning methods is proposed, distinguishing between inductive and transductive methods. Inductive methods aim to construct a general classification model, while transductive methods focus on making predictions for specific unlabelled data points. The taxonomy is further divided into wrapper methods, unsupervised preprocessing methods, and intrinsically semi-supervised methods. The chapter details various wrapper methods, such as self-training, co-training, and pseudo-labelled boosting methods. Self-training involves iteratively training a supervised classifier on labeled and pseudo-labelled data, while co-training extends this approach to multiple classifiers. Pseudo-labelled boosting methods build a classifier ensemble by sequentially training individual classifiers on labeled and pseudo-labelled data. The chapter concludes with a discussion on the empirical evaluation of semi-supervised learning methods, emphasizing the importance of diverse datasets and strong supervised baselines. It highlights the need for realistic evaluation to assess the effectiveness of semi-supervised learning techniques.

A survey on semi-supervised learning

2019 | Jesper E. van Engelen, Holger H. Hoos