Deep learning-based approaches for multi-omics data integration and analysis

Deep learning-based approaches for multi-omics data integration and analysis

2024 | Jenna L. Ballard, Zexuan Wang, Wenrui Li, Li Shen, Qi Long
This review article by Ballard et al. explores the integration and analysis of multi-omics data using deep learning approaches. Multi-omics data, which includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics) and imaging (radiomics, pathomics) modalities, offers valuable insights into biological processes and diseases. The authors categorize recent deep learning-based methods into non-generative and generative categories, discussing their unique capabilities and emerging themes in multi-omics integration. **Non-generative Methods:** - **Feedforward Neural Networks (FNNs):** These methods learn a mapping from input to outcome without modeling the underlying data distribution. They can handle tabular molecular data and imaging features but are limited in handling incomplete data and do not exploit inter-modality relationships effectively. - **Graph Convolutional Neural Networks (GCNs):** GCNs leverage similarity networks to exploit both omics features and sample correlations, enhancing interpretability and incorporating prior biological knowledge. They are suitable for tabular data and PPI networks but may be computationally intensive for large datasets. - **Autoencoders (AEs):** AEs are used for dimensionality reduction and learning nonlinear mappings to a low-dimensional latent space. They can handle complementary and consensus principles, making them useful for clustering and supervised tasks. However, they do not handle missing data and are more complex models. **Generative Methods:** - **Variational Methods:** These methods model the joint probability distribution of data and labels, allowing for the integration of multiple omics layers into a single representation. Variational autoencoders (VAEs) are particularly useful for handling incomplete data and learning biologically meaningful relationships. They can be used for unsupervised and supervised learning, with some methods focusing on dimensionality reduction and others on generating task-relevant embeddings. - **Generative Adversarial Networks (GANs):** GANs use an adversarial procedure to capture the data distribution, improving the realism of generated data. Subtype-GAN is an example that handles multiple modalities through a multi-input-multi-output network and an adversarial generation network, ensuring the shared embedding space matches a prior distribution. The review highlights the strengths and limitations of each category of methods, emphasizing the potential of generative methods in handling incomplete data and integrating diverse data types. The authors expect further advancements in methods that can better handle missing data and integrate more data types, improving performance on downstream tasks by capturing a comprehensive view of each sample.This review article by Ballard et al. explores the integration and analysis of multi-omics data using deep learning approaches. Multi-omics data, which includes molecular (genomics, transcriptomics, proteomics, epigenomics, metabolomics) and imaging (radiomics, pathomics) modalities, offers valuable insights into biological processes and diseases. The authors categorize recent deep learning-based methods into non-generative and generative categories, discussing their unique capabilities and emerging themes in multi-omics integration. **Non-generative Methods:** - **Feedforward Neural Networks (FNNs):** These methods learn a mapping from input to outcome without modeling the underlying data distribution. They can handle tabular molecular data and imaging features but are limited in handling incomplete data and do not exploit inter-modality relationships effectively. - **Graph Convolutional Neural Networks (GCNs):** GCNs leverage similarity networks to exploit both omics features and sample correlations, enhancing interpretability and incorporating prior biological knowledge. They are suitable for tabular data and PPI networks but may be computationally intensive for large datasets. - **Autoencoders (AEs):** AEs are used for dimensionality reduction and learning nonlinear mappings to a low-dimensional latent space. They can handle complementary and consensus principles, making them useful for clustering and supervised tasks. However, they do not handle missing data and are more complex models. **Generative Methods:** - **Variational Methods:** These methods model the joint probability distribution of data and labels, allowing for the integration of multiple omics layers into a single representation. Variational autoencoders (VAEs) are particularly useful for handling incomplete data and learning biologically meaningful relationships. They can be used for unsupervised and supervised learning, with some methods focusing on dimensionality reduction and others on generating task-relevant embeddings. - **Generative Adversarial Networks (GANs):** GANs use an adversarial procedure to capture the data distribution, improving the realism of generated data. Subtype-GAN is an example that handles multiple modalities through a multi-input-multi-output network and an adversarial generation network, ensuring the shared embedding space matches a prior distribution. The review highlights the strengths and limitations of each category of methods, emphasizing the potential of generative methods in handling incomplete data and integrating diverse data types. The authors expect further advancements in methods that can better handle missing data and integrate more data types, improving performance on downstream tasks by capturing a comprehensive view of each sample.
Reach us at info@study.space
[slides and audio] Deep learning-based approaches for multi-omics data integration and analysis