2024 | Marco Teixeira, Francisco Silva, Rui M. Ferreira, Tania Pereira, Ceu Figueiredo, Hélder P. Oliveira
This review discusses the application of Machine Learning (ML) methods for characterizing cancer using microbiome data. It highlights the complex nature of cancer-related signatures, which often involve multiple microbial taxa, making ML approaches essential for uncovering these relationships. The review covers various aspects of the process, including sample collection, processing, and decontamination, feature selection and transformation, and model validation. It also examines the challenges posed by the high dimensionality and sparsity of microbiome data, and the importance of dimensionality reduction techniques. The review evaluates popular ML models such as Support Vector Machines (SVMs), Random Forests, Boosting, Logistic Regression, and Artificial Neural Networks (ANNs), discussing their strengths and limitations in cancer characterization. It emphasizes the need for further improvements in model accuracy and generalizability, particularly in addressing technical artifacts and leveraging advanced deep learning techniques. The review concludes by outlining future research directions and strategies to enhance the performance and clinical applicability of ML models in cancer characterization from microbiome data.This review discusses the application of Machine Learning (ML) methods for characterizing cancer using microbiome data. It highlights the complex nature of cancer-related signatures, which often involve multiple microbial taxa, making ML approaches essential for uncovering these relationships. The review covers various aspects of the process, including sample collection, processing, and decontamination, feature selection and transformation, and model validation. It also examines the challenges posed by the high dimensionality and sparsity of microbiome data, and the importance of dimensionality reduction techniques. The review evaluates popular ML models such as Support Vector Machines (SVMs), Random Forests, Boosting, Logistic Regression, and Artificial Neural Networks (ANNs), discussing their strengths and limitations in cancer characterization. It emphasizes the need for further improvements in model accuracy and generalizability, particularly in addressing technical artifacts and leveraging advanced deep learning techniques. The review concludes by outlining future research directions and strategies to enhance the performance and clinical applicability of ML models in cancer characterization from microbiome data.