[slides and audio] Site-saturation mutagenesis of 500 human protein domains

This study presents a large-scale experimental analysis of missense variants in human proteins, focusing on the effects of over 500,000 variants on more than 500 human protein domains. The research uses a microchip-based massive parallel synthesis (mMPS) technology to generate a library of amino acid variants and an abundance protein fragment complementation assay (aPCA) to quantify the impact of these variants on protein stability and abundance. The dataset, named Human Domainome I, provides a comprehensive reference for interpreting clinical genetic variants and benchmarking computational methods for variant effect prediction. Key findings include: - 60% of pathogenic missense variants reduce protein stability. - The contribution of stability to protein fitness varies across different protein families and diseases, with stability being particularly important in recessive disorders. - Mutational effects on stability are largely conserved in homologous domains, enabling accurate stability prediction across entire protein families using energy models. - The study demonstrates the feasibility of large-scale experimental analysis of human protein variants and provides a valuable resource for understanding the functional consequences of genetic variations. The research also evaluates the performance of various computational methods in predicting variant effects and identifies functional sites in proteins by combining abundance measurements with evolutionary fitness data. The findings highlight the importance of stability in protein function and evolution, and the potential of energy models for proteome-wide stability predictions.This study presents a large-scale experimental analysis of missense variants in human proteins, focusing on the effects of over 500,000 variants on more than 500 human protein domains. The research uses a microchip-based massive parallel synthesis (mMPS) technology to generate a library of amino acid variants and an abundance protein fragment complementation assay (aPCA) to quantify the impact of these variants on protein stability and abundance. The dataset, named Human Domainome I, provides a comprehensive reference for interpreting clinical genetic variants and benchmarking computational methods for variant effect prediction. Key findings include: - 60% of pathogenic missense variants reduce protein stability. - The contribution of stability to protein fitness varies across different protein families and diseases, with stability being particularly important in recessive disorders. - Mutational effects on stability are largely conserved in homologous domains, enabling accurate stability prediction across entire protein families using energy models. - The study demonstrates the feasibility of large-scale experimental analysis of human protein variants and provides a valuable resource for understanding the functional consequences of genetic variations. The research also evaluates the performance of various computational methods in predicting variant effects and identifies functional sites in proteins by combining abundance measurements with evolutionary fitness data. The findings highlight the importance of stability in protein function and evolution, and the potential of energy models for proteome-wide stability predictions.

Site-saturation mutagenesis of 500 human protein domains

23 January 2025 | Antoni Beltran, Xiang'er Jiang, Yue Shen & Ben Lehner