The Cancer Cell Line Encyclopedia (CCLE) has been expanded with comprehensive molecular data for 1,072 cancer cell lines, including genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression, and reverse-phase protein array (RPPA) data. This dataset, combined with functional data such as drug sensitivity, short hairpin RNA (shRNA) knockdown, and CRISPR–Cas9 knockout data, provides a resource to accelerate cancer research using model cancer cell lines. The CCLE includes data from 329 whole-genome sequenced (WGS), 326 whole-exome sequenced (WES), 1,019 RNA-seq, 899 RPPA, 843 reduced representation bisulfite sequencing (RRBS), 954 microRNA expression, and 897 global histone modification profiles. Additionally, 225 metabolite abundance measures for 928 cell lines were reported. The integration of these data with functional characterizations reveals potential targets for cancer drugs and associated biomarkers.
Genetic characterization of the CCLE previously included sequencing of 1,650 genes and single nucleotide polymorphism (SNP) array copy number profiles in 947 cell lines. A harmonized variant calling pipeline was used to integrate WES, WGS, deep RNA sequencing, RainDance-based targeted sequencing, and Sanger Genomics of Drug Sensitivity in Cancer (GDSC) WES data. Comparison of germline variant calls between CCLE and GDSC data revealed a high concordance. Mutation correlation was high for cancer hotspot somatic variants but lower for non-hotspot variants, suggesting that genetic drift in passaged cell lines mainly affects passenger mutations.
The CCLE also includes structural variant (SV) and gene-fusion event annotations. Project Achilles and DRIVE short hairpin RNA (shRNA) and single guide RNA (sgRNA) gene dependency datasets allow comparison of genetic events with cancer dependencies. Fusion calls were compared with RNA interference (RNAi) loss-of-function data, identifying driver events such as ESR1-CCDC170 and AFF1-KMT2A. TERT promoter mutations were found in 16.7% of 503 cell lines, making it the most common non-coding somatic mutation in cancer cell lines.
Patterns of somatic mutations indicative of underlying mutational processes were analyzed using 30 COSMIC mutational signatures. These signatures showed considerable correlation between signature activities in CCLE and The Cancer Genome Atlas (TCGA) cancer types. Notably, higher genetic drift was observed in cell lines with COSMIC 6, 21, 26, and 15 signatures related to microsatellite instability (MSI) and COSMIC 5 and 1 signatures related to clock-like mutationalThe Cancer Cell Line Encyclopedia (CCLE) has been expanded with comprehensive molecular data for 1,072 cancer cell lines, including genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression, and reverse-phase protein array (RPPA) data. This dataset, combined with functional data such as drug sensitivity, short hairpin RNA (shRNA) knockdown, and CRISPR–Cas9 knockout data, provides a resource to accelerate cancer research using model cancer cell lines. The CCLE includes data from 329 whole-genome sequenced (WGS), 326 whole-exome sequenced (WES), 1,019 RNA-seq, 899 RPPA, 843 reduced representation bisulfite sequencing (RRBS), 954 microRNA expression, and 897 global histone modification profiles. Additionally, 225 metabolite abundance measures for 928 cell lines were reported. The integration of these data with functional characterizations reveals potential targets for cancer drugs and associated biomarkers.
Genetic characterization of the CCLE previously included sequencing of 1,650 genes and single nucleotide polymorphism (SNP) array copy number profiles in 947 cell lines. A harmonized variant calling pipeline was used to integrate WES, WGS, deep RNA sequencing, RainDance-based targeted sequencing, and Sanger Genomics of Drug Sensitivity in Cancer (GDSC) WES data. Comparison of germline variant calls between CCLE and GDSC data revealed a high concordance. Mutation correlation was high for cancer hotspot somatic variants but lower for non-hotspot variants, suggesting that genetic drift in passaged cell lines mainly affects passenger mutations.
The CCLE also includes structural variant (SV) and gene-fusion event annotations. Project Achilles and DRIVE short hairpin RNA (shRNA) and single guide RNA (sgRNA) gene dependency datasets allow comparison of genetic events with cancer dependencies. Fusion calls were compared with RNA interference (RNAi) loss-of-function data, identifying driver events such as ESR1-CCDC170 and AFF1-KMT2A. TERT promoter mutations were found in 16.7% of 503 cell lines, making it the most common non-coding somatic mutation in cancer cell lines.
Patterns of somatic mutations indicative of underlying mutational processes were analyzed using 30 COSMIC mutational signatures. These signatures showed considerable correlation between signature activities in CCLE and The Cancer Genome Atlas (TCGA) cancer types. Notably, higher genetic drift was observed in cell lines with COSMIC 6, 21, 26, and 15 signatures related to microsatellite instability (MSI) and COSMIC 5 and 1 signatures related to clock-like mutational