mixOmics: An R package for ‘omics feature selection and multiple data integration

mixOmics: An R package for ‘omics feature selection and multiple data integration

November 3, 2017 | Florian Rohart, Benoit Gautier, Amrit Singh, Kim-Anh Lê Cao
The article introduces mixOmics, an R package designed for multivariate analysis of biological data sets, particularly focusing on 'omics data such as transcriptomics, proteomics, and metabolomics. The package aims to integrate and explore large-scale biological data, providing tools for dimension reduction, feature selection, and visualization. Key features include: 1. **Multivariate Analysis**: MixOmics offers a range of multivariate projection-based methods, including Principal Component Analysis (PCA), Partial Least Squares Discriminant Analysis (PLS-DA), and sparse variants like sparse PLS-DA (sPLS-DA). These methods are computationally efficient and can handle large datasets with thousands of features. 2. **Data Integration**: The package includes novel frameworks for integrating multiple 'omics data sets (N-integration) and independent studies (P-integration). DIABLO enables the integration of the same biological samples measured on different 'omics platforms, while MINT integrates multiple independent studies or data sets. 3. **Feature Selection**: MixOmics provides tools for feature selection, allowing users to identify key predictors that form a molecular signature. This is achieved through $\ell_1$ regularization, which helps in refining biological hypotheses and suggesting downstream analyses. 4. **Visualization**: The package includes various visualization tools to interpret statistical and biological results, such as sample plots, variable plots, correlation circle plots, and relevance networks. 5. **Supervised Analysis**: MixOmics supports supervised analyses, where the goal is to classify or predict outcomes based on biological features. The package includes functions for parameter tuning, performance evaluation, and visualization of prediction areas. 6. **Applications**: The article illustrates the use of mixOmics through three case studies: a single 'omics analysis with PLS-DA and sPLS-DA, N-integration with DIABLO, and P-integration with MINT. These examples demonstrate the package's capabilities in identifying discriminatory features and molecular signatures across different types of 'omics data. Overall, mixOmics is a comprehensive tool for exploring and integrating large biological data sets, providing a systematic approach to uncovering biological insights and identifying robust molecular signatures.The article introduces mixOmics, an R package designed for multivariate analysis of biological data sets, particularly focusing on 'omics data such as transcriptomics, proteomics, and metabolomics. The package aims to integrate and explore large-scale biological data, providing tools for dimension reduction, feature selection, and visualization. Key features include: 1. **Multivariate Analysis**: MixOmics offers a range of multivariate projection-based methods, including Principal Component Analysis (PCA), Partial Least Squares Discriminant Analysis (PLS-DA), and sparse variants like sparse PLS-DA (sPLS-DA). These methods are computationally efficient and can handle large datasets with thousands of features. 2. **Data Integration**: The package includes novel frameworks for integrating multiple 'omics data sets (N-integration) and independent studies (P-integration). DIABLO enables the integration of the same biological samples measured on different 'omics platforms, while MINT integrates multiple independent studies or data sets. 3. **Feature Selection**: MixOmics provides tools for feature selection, allowing users to identify key predictors that form a molecular signature. This is achieved through $\ell_1$ regularization, which helps in refining biological hypotheses and suggesting downstream analyses. 4. **Visualization**: The package includes various visualization tools to interpret statistical and biological results, such as sample plots, variable plots, correlation circle plots, and relevance networks. 5. **Supervised Analysis**: MixOmics supports supervised analyses, where the goal is to classify or predict outcomes based on biological features. The package includes functions for parameter tuning, performance evaluation, and visualization of prediction areas. 6. **Applications**: The article illustrates the use of mixOmics through three case studies: a single 'omics analysis with PLS-DA and sPLS-DA, N-integration with DIABLO, and P-integration with MINT. These examples demonstrate the package's capabilities in identifying discriminatory features and molecular signatures across different types of 'omics data. Overall, mixOmics is a comprehensive tool for exploring and integrating large biological data sets, providing a systematic approach to uncovering biological insights and identifying robust molecular signatures.
Reach us at info@study.space
[slides] mixOmics%3A An R package for %E2%80%98omics feature selection and multiple data integration | StudySpace