Site-saturation mutagenesis of 500 human protein domains

Site-saturation mutagenesis of 500 human protein domains

23 January 2025 | Antoni Beltran, Xiang'er Jiang, Yue Shen & Ben Lehner
This study presents a large-scale experimental analysis of human missense variants across more than 500 protein domains. Using DNA synthesis and cellular selection experiments, the researchers quantified the effects of over 500,000 variants on the stability of more than 500 human protein domains. The dataset, called 'Human Domainome I', reveals that 60% of pathogenic missense variants reduce protein stability. Stability contributes significantly to protein fitness, particularly in recessive disorders. The study combines stability measurements with protein language models to annotate functional sites across proteins and enables accurate stability prediction across entire protein families using energy models. The researchers used microchip-based massive in parallel synthesis (mMPS) technology to create a library of 1,230,584 amino acid variants in 1,248 structurally diverse protein domains. They employed an abundance protein fragment complementation assay (aPCA) to quantify the effect of these variants on domain stability. The dataset includes 563,534 variants in 522 protein domains, with 503 from human proteins. Abundance measurements were highly reproducible and correlated well with independent in vitro measurements of protein fold stability. The study evaluated the performance of computational variant effect predictors (VEPs) and found that the graph neural network ThermoMPNN performed best in predicting stability changes. The contribution of stability to protein fitness varies across domain families, with stability making a larger contribution to the fitness of all-beta domains than to all-alpha or mixed domains. The study also identified functional sites in proteins by combining abundance measurements with evolutionary fitness quantified by ESM1v. The study found that 61% of pathogenic variants cause detectable domain destabilization, while 48% are strongly destabilizing. Stability changes were a poorer predictor of pathogenic variants in some domains, such as the MBD domain of MECP2, where many mutations do not destabilize the domain but interfere with its function. The study also found that the mode of inheritance of mutations in CRX correlates with their stability effects, with recessive mutations being strongly destabilizing and dominant mutations being stable. The study demonstrated that mutational effects on stability are largely conserved in homologous domains, with a small contribution from epistasis that increases with sequence divergence. This energetic additivity enables proteome-wide prediction of stability changes for entire protein families. The study also showed that the Boltzmann energy models outperformed stability predictors in pathogenicity prediction and had good performance on stability deep mutagenesis scans generated using in vitro proteolysis selections. The study provides a large, standardized reference dataset for the interpretation of clinical variants and for benchmarking and training computational methods. It also suggests a strategy for expanding Human Domainome 1 proteome-wide by experimentally mutagenizing representative examples for all families. The study highlights the importance of stability in protein function and disease, andThis study presents a large-scale experimental analysis of human missense variants across more than 500 protein domains. Using DNA synthesis and cellular selection experiments, the researchers quantified the effects of over 500,000 variants on the stability of more than 500 human protein domains. The dataset, called 'Human Domainome I', reveals that 60% of pathogenic missense variants reduce protein stability. Stability contributes significantly to protein fitness, particularly in recessive disorders. The study combines stability measurements with protein language models to annotate functional sites across proteins and enables accurate stability prediction across entire protein families using energy models. The researchers used microchip-based massive in parallel synthesis (mMPS) technology to create a library of 1,230,584 amino acid variants in 1,248 structurally diverse protein domains. They employed an abundance protein fragment complementation assay (aPCA) to quantify the effect of these variants on domain stability. The dataset includes 563,534 variants in 522 protein domains, with 503 from human proteins. Abundance measurements were highly reproducible and correlated well with independent in vitro measurements of protein fold stability. The study evaluated the performance of computational variant effect predictors (VEPs) and found that the graph neural network ThermoMPNN performed best in predicting stability changes. The contribution of stability to protein fitness varies across domain families, with stability making a larger contribution to the fitness of all-beta domains than to all-alpha or mixed domains. The study also identified functional sites in proteins by combining abundance measurements with evolutionary fitness quantified by ESM1v. The study found that 61% of pathogenic variants cause detectable domain destabilization, while 48% are strongly destabilizing. Stability changes were a poorer predictor of pathogenic variants in some domains, such as the MBD domain of MECP2, where many mutations do not destabilize the domain but interfere with its function. The study also found that the mode of inheritance of mutations in CRX correlates with their stability effects, with recessive mutations being strongly destabilizing and dominant mutations being stable. The study demonstrated that mutational effects on stability are largely conserved in homologous domains, with a small contribution from epistasis that increases with sequence divergence. This energetic additivity enables proteome-wide prediction of stability changes for entire protein families. The study also showed that the Boltzmann energy models outperformed stability predictors in pathogenicity prediction and had good performance on stability deep mutagenesis scans generated using in vitro proteolysis selections. The study provides a large, standardized reference dataset for the interpretation of clinical variants and for benchmarking and training computational methods. It also suggests a strategy for expanding Human Domainome 1 proteome-wide by experimentally mutagenizing representative examples for all families. The study highlights the importance of stability in protein function and disease, and
Reach us at info@study.space