2011 | Svitlana Tyekucheva, Luigi Marchionni, Rachel Karchin and Giovanni Parmigiani
This study introduces and evaluates methods for interpreting simultaneous measurements of multiple genomic features in the same biological samples. The methods use gene sets to provide an interpretable common scale for diverse genomic information. The approach detects genetic effects that may act through different mechanisms in different samples and identifies and validates important disease-related gene sets that would not be discovered by analyzing each data type individually.
The study uses gene sets to integrate diverse genomic data types, including RNA transcriptional levels, genotype variation, DNA copy number variation, and epigenetic marks. Annotated collections of gene sets, such as those in the Molecular Signatures Database (MSigDb), are essential for integration. The study compares two approaches: an integrative approach that computes gene-to-phenotype association scores using all data types and then performs gene set analysis, and a meta-analytical approach that performs separate gene set analyses for each data type and then derives a consensus significance score.
The study applies these methods to glioblastoma multiforme (GBM) data from The Cancer Genome Atlas (TCGA) and validates findings using data from the Rembrandt database. The integrative approach successfully detects gene sets related to metabolic processes, such as glycolysis and sugar metabolism, which are associated with survival differences between short- and long-term GBM survivors. The study also shows that the integrative approach outperforms single-data-type and meta-analytical approaches in detecting gene sets associated with survival.
Simulations demonstrate that the integrative approach can detect disease-related gene sets that would not be discovered by analyzing each data type individually. The study also shows that the integrative approach is more sensitive to detecting gene sets when genes are altered by different biological mechanisms. The results suggest that integrating multiple data types improves the ability to detect gene sets associated with phenotypes, particularly in complex diseases like GBM.
The study highlights the importance of gene set analysis in interpreting genomic data and provides a framework for integrating diverse genomic data types. The methods developed in this study can be applied to other phenotypes and include covariates, making them broadly applicable to genomic research. The study also discusses the limitations of the methods and suggests that further research is needed to improve the integration of data from different sources.This study introduces and evaluates methods for interpreting simultaneous measurements of multiple genomic features in the same biological samples. The methods use gene sets to provide an interpretable common scale for diverse genomic information. The approach detects genetic effects that may act through different mechanisms in different samples and identifies and validates important disease-related gene sets that would not be discovered by analyzing each data type individually.
The study uses gene sets to integrate diverse genomic data types, including RNA transcriptional levels, genotype variation, DNA copy number variation, and epigenetic marks. Annotated collections of gene sets, such as those in the Molecular Signatures Database (MSigDb), are essential for integration. The study compares two approaches: an integrative approach that computes gene-to-phenotype association scores using all data types and then performs gene set analysis, and a meta-analytical approach that performs separate gene set analyses for each data type and then derives a consensus significance score.
The study applies these methods to glioblastoma multiforme (GBM) data from The Cancer Genome Atlas (TCGA) and validates findings using data from the Rembrandt database. The integrative approach successfully detects gene sets related to metabolic processes, such as glycolysis and sugar metabolism, which are associated with survival differences between short- and long-term GBM survivors. The study also shows that the integrative approach outperforms single-data-type and meta-analytical approaches in detecting gene sets associated with survival.
Simulations demonstrate that the integrative approach can detect disease-related gene sets that would not be discovered by analyzing each data type individually. The study also shows that the integrative approach is more sensitive to detecting gene sets when genes are altered by different biological mechanisms. The results suggest that integrating multiple data types improves the ability to detect gene sets associated with phenotypes, particularly in complex diseases like GBM.
The study highlights the importance of gene set analysis in interpreting genomic data and provides a framework for integrating diverse genomic data types. The methods developed in this study can be applied to other phenotypes and include covariates, making them broadly applicable to genomic research. The study also discusses the limitations of the methods and suggests that further research is needed to improve the integration of data from different sources.