18 July 2024 | Kathie Y. Sun, Xiaodong Bai, Siying Chen, Suying Bao, Chuanyi Zhang, Manav Kapoor, Joshua Backman, Tyler Joseph, Evan Maxwell, George Mitra, Alexander Gorovits, Adam Mansfield, Boris Boutkov, Sujit Gokhale, Lukas Haberge, Anthony Marketta, Adam E. Locke, Liron Ganel, Alicia Hawes, Michael D. Kessler, Deepika Sharma, Jeffrey Staples, Jonas Bovijn, Sahar Gelfman, Alessandro Di Gioia, Veera M. Rajagopal, Alexander Lopez, Jennifer Rico Varela, Jesus Alegre-Diaz, Jaime Berumen, Roberto Tapia-Conyer, Pablo Kuri-Morales, Jason Torres, Jonathan Emberson, Rory Collins, Regeneron Genetics Center, RGC-ME Cohort Partners, Michael Cantor, Timothy Thornton, Hyun Min Kang, John D. Overton, Alan R. Shuldiner, M. Laura Cremona, Mona Nafde, Aris Baras, Goncalo Abecasis, Jonathan Marchini, Jeffrey G. Reid
A deep catalogue of protein-coding variation in 983,578 individuals has been presented, derived from exome sequencing of a diverse population. The study identified over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. It identified 4,848 genes with rare biallelic pLOF variants, 1,751 of which were previously unreported. The study also identified 3,988 LOF-intolerant genes, including 86 previously considered tolerant and 1,153 with no established disease annotations. Regions of missense depletion were defined at high resolution, with 1,482 genes showing depletion despite being tolerant of pLOF variants. The study estimated that 3% of individuals have clinically actionable genetic variants, and 11,773 ClinVar variants with unknown significance are likely to be deleterious cryptic splice sites. The data is publicly accessible through a variant allele frequency browser.
Exome sequencing has enabled the discovery of rare coding variants, providing insights into gene function and accelerating the discovery of disease-associated genes. It has also identified protective alleles that highlight drug targets for pharmacological intervention. The study highlights the importance of large, representative genetic datasets for precision medicine. The RGC-ME dataset, which includes 983,578 individuals from diverse populations, provides a comprehensive resource for variant interpretation and genetics-informed precision medicine.
The study identified 1,115,116 pLOF variants, including those causing premature stop, affecting essential splice donor and acceptor sites, or causing frameshifts. Of these, 53.3% were singletons. The study also identified 4,645,092 synonymous and 10,444,562 missense variants. A total of 48% of coding variants in canonical transcripts were unique to RGC-ME. The study estimated that 3% of individuals have a clinically actionable genetic variant, and 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites.
The study identified 3,988 highly constrained genes with s_het values greater than 0.073 and a lower bound greater than 0.021. These genes are likely to have high functional importance. The study also identified 41,114 missense constrained regions in 12,349 genes. These regions are important for understanding gene function and variant prioritization.
The study identified 4,064 genes with regions depleted in missense variation. These genes may have significant functional importance despite being tolerant of pLOF variants. The study also identified 4,848 genes with carriers of biallelic pLOF variantsA deep catalogue of protein-coding variation in 983,578 individuals has been presented, derived from exome sequencing of a diverse population. The study identified over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. It identified 4,848 genes with rare biallelic pLOF variants, 1,751 of which were previously unreported. The study also identified 3,988 LOF-intolerant genes, including 86 previously considered tolerant and 1,153 with no established disease annotations. Regions of missense depletion were defined at high resolution, with 1,482 genes showing depletion despite being tolerant of pLOF variants. The study estimated that 3% of individuals have clinically actionable genetic variants, and 11,773 ClinVar variants with unknown significance are likely to be deleterious cryptic splice sites. The data is publicly accessible through a variant allele frequency browser.
Exome sequencing has enabled the discovery of rare coding variants, providing insights into gene function and accelerating the discovery of disease-associated genes. It has also identified protective alleles that highlight drug targets for pharmacological intervention. The study highlights the importance of large, representative genetic datasets for precision medicine. The RGC-ME dataset, which includes 983,578 individuals from diverse populations, provides a comprehensive resource for variant interpretation and genetics-informed precision medicine.
The study identified 1,115,116 pLOF variants, including those causing premature stop, affecting essential splice donor and acceptor sites, or causing frameshifts. Of these, 53.3% were singletons. The study also identified 4,645,092 synonymous and 10,444,562 missense variants. A total of 48% of coding variants in canonical transcripts were unique to RGC-ME. The study estimated that 3% of individuals have a clinically actionable genetic variant, and 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites.
The study identified 3,988 highly constrained genes with s_het values greater than 0.073 and a lower bound greater than 0.021. These genes are likely to have high functional importance. The study also identified 41,114 missense constrained regions in 12,349 genes. These regions are important for understanding gene function and variant prioritization.
The study identified 4,064 genes with regions depleted in missense variation. These genes may have significant functional importance despite being tolerant of pLOF variants. The study also identified 4,848 genes with carriers of biallelic pLOF variants