2024 | George I. Gavrilidis, Vasileios Vasileiou, Aspasia Orfanou, Naveed Ishaque, Fotis Psomopoulos
This mini-review summarizes recent advances in perturbation modelling across single-cell omic modalities. Perturbation modelling aims to understand how external biological or chemical interventions affect cellular physiology, including transcription factors, signal transducers, biological pathways, and dynamic cell states. Machine learning (ML) and deep learning (DL) tools are used to predict the effects of perturbations on various single-cell datasets. However, the rapid growth of tools and datasets makes it challenging for researchers to keep up with the latest developments in this field. This review outlines the main objectives of perturbation modelling, summarizes novel single-cell perturbation technologies, and reviews computational methods ranging from classical statistical inference to various ML and DL architectures, including gene regulatory network (GRN)-based approaches and ensemble learning. It also discusses the rising trend of large foundational models in single-cell perturbation modelling, inspired by large language models. The review critically assesses the challenges in single-cell perturbation modelling and highlights future directions such as perturbation atlases, multiomics and spatial datasets, causal machine learning for interpretability, multi-task learning for performance and explainability, and solving interoperability and benchmarking issues. The review also introduces various single-cell technologies and datasets used for perturbation modelling, including Perturb-seq, CRISPR-seq, CROP-seq, MIX-seq, sci-Plex, Tap-seq, PoKI-seq, ECCITE-seq, Perturb-CITE-seq, Mosaic-seq, sc-Tiling, Perturb-b-ATAC, Spear-ATAC, and CRISPR-sciATAC. It also discusses various perturbation models, including shallow models, mixed ML models, perturbation similarity models, GRN-prioritising models, complex generative models, and foundational models. These models are used to predict the effects of perturbations on cellular physiology, including drug repurposing, gene regulatory network inference, and causal inference. The review also highlights the importance of integrating biological knowledge into computational models to improve their interpretability and accuracy. Finally, it discusses the potential of perturbation modelling in understanding disease mechanisms, developing personalized therapies, and improving drug discovery.This mini-review summarizes recent advances in perturbation modelling across single-cell omic modalities. Perturbation modelling aims to understand how external biological or chemical interventions affect cellular physiology, including transcription factors, signal transducers, biological pathways, and dynamic cell states. Machine learning (ML) and deep learning (DL) tools are used to predict the effects of perturbations on various single-cell datasets. However, the rapid growth of tools and datasets makes it challenging for researchers to keep up with the latest developments in this field. This review outlines the main objectives of perturbation modelling, summarizes novel single-cell perturbation technologies, and reviews computational methods ranging from classical statistical inference to various ML and DL architectures, including gene regulatory network (GRN)-based approaches and ensemble learning. It also discusses the rising trend of large foundational models in single-cell perturbation modelling, inspired by large language models. The review critically assesses the challenges in single-cell perturbation modelling and highlights future directions such as perturbation atlases, multiomics and spatial datasets, causal machine learning for interpretability, multi-task learning for performance and explainability, and solving interoperability and benchmarking issues. The review also introduces various single-cell technologies and datasets used for perturbation modelling, including Perturb-seq, CRISPR-seq, CROP-seq, MIX-seq, sci-Plex, Tap-seq, PoKI-seq, ECCITE-seq, Perturb-CITE-seq, Mosaic-seq, sc-Tiling, Perturb-b-ATAC, Spear-ATAC, and CRISPR-sciATAC. It also discusses various perturbation models, including shallow models, mixed ML models, perturbation similarity models, GRN-prioritising models, complex generative models, and foundational models. These models are used to predict the effects of perturbations on cellular physiology, including drug repurposing, gene regulatory network inference, and causal inference. The review also highlights the importance of integrating biological knowledge into computational models to improve their interpretability and accuracy. Finally, it discusses the potential of perturbation modelling in understanding disease mechanisms, developing personalized therapies, and improving drug discovery.