December 9, 2003 | Jennifer G. Dy, Carla E. Brodley
This paper addresses the challenges of feature selection for unsupervised learning, particularly in the context of clustering. The authors identify two key issues: the need to determine the number of clusters and the need to normalize the bias of feature selection criteria with respect to dimensionality. They propose a solution called FSSEM (Feature Subset Selection using Expectation-Maximization clustering) and evaluate it using two performance criteria: scatter separability and maximum likelihood. The paper provides theoretical explanations for the biases in these criteria and introduces a cross-projection normalization scheme to mitigate these biases. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed approach in improving clustering performance and addressing the issues of feature selection in unsupervised learning.This paper addresses the challenges of feature selection for unsupervised learning, particularly in the context of clustering. The authors identify two key issues: the need to determine the number of clusters and the need to normalize the bias of feature selection criteria with respect to dimensionality. They propose a solution called FSSEM (Feature Subset Selection using Expectation-Maximization clustering) and evaluate it using two performance criteria: scatter separability and maximum likelihood. The paper provides theoretical explanations for the biases in these criteria and introduces a cross-projection normalization scheme to mitigate these biases. Experimental results on synthetic and real-world datasets demonstrate the effectiveness of the proposed approach in improving clustering performance and addressing the issues of feature selection in unsupervised learning.