Understanding Democratizing protein language models with parameter-efficient fine-tuning

The paper "Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning" by Samuel Sledzieski, Meghana Kshirsagar, Minkyung Baek, Rahul Dodhia, Juan Lavista Ferres, and Bonnie Berger introduces a parameter-efficient fine-tuning (PEFT) approach to large protein language models (PLMs). Traditional fine-tuning (FT) methods require significant computational resources, making them challenging for research groups with limited computational power. The authors leverage the LoRA (Low-Rank Adaptation) method, which adds a small number of new parameters to the model, to train PLMs for two important proteomic tasks: predicting protein-protein interactions (PPIs) and predicting the symmetry of homo-oligomer quaternary structures. They demonstrate that PEFT methods achieve competitive performance with traditional FT while requiring less memory and fewer parameters. Specifically, for PPI prediction, training only the classification head using PEFT remains competitive with full FT, using orders of magnitude fewer parameters. The study also evaluates hyperparameter choices for LoRA and finds that adding adapters with a rank of at least 4 to the key and value weight matrices of the self-attention layers yields optimal performance. The authors provide a blueprint for applying PEFT methods to PLMs in proteomics, making it more accessible for academic labs and small biotech startups. The code for their experiments is available open-source, facilitating further research and application of these techniques.The paper "Democratizing Protein Language Models with Parameter-Efficient Fine-Tuning" by Samuel Sledzieski, Meghana Kshirsagar, Minkyung Baek, Rahul Dodhia, Juan Lavista Ferres, and Bonnie Berger introduces a parameter-efficient fine-tuning (PEFT) approach to large protein language models (PLMs). Traditional fine-tuning (FT) methods require significant computational resources, making them challenging for research groups with limited computational power. The authors leverage the LoRA (Low-Rank Adaptation) method, which adds a small number of new parameters to the model, to train PLMs for two important proteomic tasks: predicting protein-protein interactions (PPIs) and predicting the symmetry of homo-oligomer quaternary structures. They demonstrate that PEFT methods achieve competitive performance with traditional FT while requiring less memory and fewer parameters. Specifically, for PPI prediction, training only the classification head using PEFT remains competitive with full FT, using orders of magnitude fewer parameters. The study also evaluates hyperparameter choices for LoRA and finds that adding adapters with a rank of at least 4 to the key and value weight matrices of the self-attention layers yields optimal performance. The authors provide a blueprint for applying PEFT methods to PLMs in proteomics, making it more accessible for academic labs and small biotech startups. The code for their experiments is available open-source, facilitating further research and application of these techniques.

Democratizing protein language models with parameter-efficient fine-tuning

June 20, 2024 | Samuel Sledzieski, Meghana Kshirsagar, Minkyung Baek, Rahul Dodhia, Juan Lavista Ferres, Bonnie Berger