This study investigates the impact of different feature normalization methods on the predictive performance and feature selection in radiomics. The researchers used 15 publicly available radiomics datasets and compared seven normalization methods: z-Score, Min–Max, power, quantile, tanh, robust z-Score (5,95), and no normalization. They employed four feature selection and classifier methods, including LASSO, extra trees, ANOVA, and Bhattacharyya, and used cross-validation to measure the area under the curve (AUC), agreement of selected features, and model calibration.
The results showed that the difference in AUC between normalization methods was relatively small, with a maximum gain of +0.012 when comparing z-Score to no normalization. However, on some datasets, the difference reached +0.051. The z-Score performed best, while the tanh transformation showed the worst performance and even decreased overall predictive performance. The quantile transformation performed slightly worse than z-Score but outperformed all other methods on one out of three datasets. The agreement between features selected by different normalization methods was only mild, reaching at most 62%.
Applying normalization before cross-validation did not introduce significant bias, except for the tanh transformation, which showed a larger bias of ±0.022. The study concluded that the choice of feature normalization method influenced predictive performance but depended strongly on the dataset and the set of selected features. The authors recommend testing multiple normalization methods to achieve the highest predictive performance, with z-Score, Min–Max, and quantile transformation being the most promising options.This study investigates the impact of different feature normalization methods on the predictive performance and feature selection in radiomics. The researchers used 15 publicly available radiomics datasets and compared seven normalization methods: z-Score, Min–Max, power, quantile, tanh, robust z-Score (5,95), and no normalization. They employed four feature selection and classifier methods, including LASSO, extra trees, ANOVA, and Bhattacharyya, and used cross-validation to measure the area under the curve (AUC), agreement of selected features, and model calibration.
The results showed that the difference in AUC between normalization methods was relatively small, with a maximum gain of +0.012 when comparing z-Score to no normalization. However, on some datasets, the difference reached +0.051. The z-Score performed best, while the tanh transformation showed the worst performance and even decreased overall predictive performance. The quantile transformation performed slightly worse than z-Score but outperformed all other methods on one out of three datasets. The agreement between features selected by different normalization methods was only mild, reaching at most 62%.
Applying normalization before cross-validation did not introduce significant bias, except for the tanh transformation, which showed a larger bias of ±0.022. The study concluded that the choice of feature normalization method influenced predictive performance but depended strongly on the dataset and the set of selected features. The authors recommend testing multiple normalization methods to achieve the highest predictive performance, with z-Score, Min–Max, and quantile transformation being the most promising options.