Aligning protein generative models with experimental fitness via Direct Preference Optimization

Aligning protein generative models with experimental fitness via Direct Preference Optimization

May 21, 2024 | Talal Widatalla, Rafael RafaIilov, and Brian Hie
The paper "Aligning protein generative models with experimental fitness via Direct Preference Optimization" by Talal Widatalla, Rafael Rafailov, and Brian Hie addresses the challenge of aligning unsupervised protein generative models with task-specific, experimental fitness information. The authors propose a method called Direct Preference Optimization (DPO) to align a structure-conditioned language model with experimental data, specifically for predicting protein stability. The DPO algorithm is designed to encourage the model to generate stable protein sequences by preferentially selecting stabilizing variants over destabilizing ones, given a protein backbone structure. The study uses the Megascale dataset, which contains stability measurements of approximately 1.84 million sequence variants from 983 protein domains. The authors train the model, named ProteinDPO, using DPO and evaluate its performance on various tasks, including stability prediction, binding affinity prediction, and single-step stabilization of diverse protein backbones. ProteinDPO outperforms both unsupervised and supervised fine-tuned models in stability prediction and consistently achieves competitive results with state-of-the-art supervised models like ThermoMPNN. Key findings include: - ProteinDPO significantly improves stability prediction compared to vanilla ESM-IF1 and supervised fine-tuned models. - It generalizes well to larger proteins and unseen folds, demonstrating the ability to predict absolute stability, thermal melting points of antibodies, and binding affinity of large complexes. - ProteinDPO-generated sequences are predicted to fold into the native structure, indicating that the model retains the structural recapitulation rules learned during unsupervised pretraining. The authors conclude that DPO effectively aligns generative models with experimental fitness information, enabling improved performance in stability prediction and other related tasks. The method is applicable to a wide range of protein-related data and could be extended to other biological data modalities.The paper "Aligning protein generative models with experimental fitness via Direct Preference Optimization" by Talal Widatalla, Rafael Rafailov, and Brian Hie addresses the challenge of aligning unsupervised protein generative models with task-specific, experimental fitness information. The authors propose a method called Direct Preference Optimization (DPO) to align a structure-conditioned language model with experimental data, specifically for predicting protein stability. The DPO algorithm is designed to encourage the model to generate stable protein sequences by preferentially selecting stabilizing variants over destabilizing ones, given a protein backbone structure. The study uses the Megascale dataset, which contains stability measurements of approximately 1.84 million sequence variants from 983 protein domains. The authors train the model, named ProteinDPO, using DPO and evaluate its performance on various tasks, including stability prediction, binding affinity prediction, and single-step stabilization of diverse protein backbones. ProteinDPO outperforms both unsupervised and supervised fine-tuned models in stability prediction and consistently achieves competitive results with state-of-the-art supervised models like ThermoMPNN. Key findings include: - ProteinDPO significantly improves stability prediction compared to vanilla ESM-IF1 and supervised fine-tuned models. - It generalizes well to larger proteins and unseen folds, demonstrating the ability to predict absolute stability, thermal melting points of antibodies, and binding affinity of large complexes. - ProteinDPO-generated sequences are predicted to fold into the native structure, indicating that the model retains the structural recapitulation rules learned during unsupervised pretraining. The authors conclude that DPO effectively aligns generative models with experimental fitness information, enabling improved performance in stability prediction and other related tasks. The method is applicable to a wide range of protein-related data and could be extended to other biological data modalities.
Reach us at info@study.space