Aligning protein generative models with experimental fitness via Direct Preference Optimization

Aligning protein generative models with experimental fitness via Direct Preference Optimization

May 21, 2024 | Talal Widatalla, Rafael Rafailov, and Brian Hie
This paper introduces ProteinDPO, a structure-conditioned language model optimized for protein stability prediction using Direct Preference Optimization (DPO). The model is trained on the Megascale dataset, which contains stability measurements of 1.84 million sequence variants across 983 protein domains. ProteinDPO aligns a pretrained structure-conditioned language model with experimental fitness data by encouraging the model to prefer stabilizing over destabilizing variants given a protein backbone structure. The resulting model outperforms both unsupervised and fine-tuned versions of the model in stability prediction and generalizes to domains beyond its training data, enabling absolute stability prediction of large proteins and binding affinity prediction of multi-chain complexes. ProteinDPO also enables single-step stabilization of diverse backbones. These results indicate that ProteinDPO has learned generalizable information from its biophysical alignment data. The model is trained using DPO, which allows the model to incorporate experimental fitness information while retaining the rich, general knowledge learned during pretraining. ProteinDPO is shown to be competitive with supervised and physics-based models in stability prediction and generalizes to a variety of tasks, including binding affinity prediction and thermal melting point prediction. The model is also a generative model that can be used to design stable protein sequences. The results suggest that DPO is an effective method for aligning a generative model with an experimental fitness landscape. The model is open-source and publicly available.This paper introduces ProteinDPO, a structure-conditioned language model optimized for protein stability prediction using Direct Preference Optimization (DPO). The model is trained on the Megascale dataset, which contains stability measurements of 1.84 million sequence variants across 983 protein domains. ProteinDPO aligns a pretrained structure-conditioned language model with experimental fitness data by encouraging the model to prefer stabilizing over destabilizing variants given a protein backbone structure. The resulting model outperforms both unsupervised and fine-tuned versions of the model in stability prediction and generalizes to domains beyond its training data, enabling absolute stability prediction of large proteins and binding affinity prediction of multi-chain complexes. ProteinDPO also enables single-step stabilization of diverse backbones. These results indicate that ProteinDPO has learned generalizable information from its biophysical alignment data. The model is trained using DPO, which allows the model to incorporate experimental fitness information while retaining the rich, general knowledge learned during pretraining. ProteinDPO is shown to be competitive with supervised and physics-based models in stability prediction and generalizes to a variety of tasks, including binding affinity prediction and thermal melting point prediction. The model is also a generative model that can be used to design stable protein sequences. The results suggest that DPO is an effective method for aligning a generative model with an experimental fitness landscape. The model is open-source and publicly available.
Reach us at info@study.space
Understanding Aligning protein generative models with experimental fitness via Direct Preference Optimization