ProLLaMA: A Protein Language Model for Multi-Task Protein Language Processing

ProLLaMA: A Protein Language Model for Multi-Task Protein Language Processing

16 Jul 2024 | Liuzhenghao Lv1, Zongying Lin1, Hao Li1,2, Yuyang Liu1, Jiaxi Cui1, Calvin Yu-Chian Chen1, Li Yuan1,2*, Yonghong Tian1,2*
ProLLaMA is a Protein Language Model (PLM) designed to handle multiple Protein Language Processing (PLP) tasks, including protein generation and understanding. Unlike existing PLMs, which are specialized in either protein generation or understanding, ProLLaMA aims to bridge this gap by leveraging a general Large Language Model (LLM) and a two-stage training framework. The first stage involves pre-training the LLM on protein language data, while the second stage tunes the model using a multi-task instruction dataset. To improve training efficiency, Protein Vocabulary Pruning (PVP) is introduced to reduce the vocabulary size. The model is evaluated on various tasks, demonstrating state-of-the-art performance in unconditional protein generation, controllable protein generation, and protein superfamily prediction. ProLLaMA's ability to handle multiple PLP tasks and its superior performance in generating structurally plausible proteins make it a significant advancement in the field of protein engineering and computational biology.ProLLaMA is a Protein Language Model (PLM) designed to handle multiple Protein Language Processing (PLP) tasks, including protein generation and understanding. Unlike existing PLMs, which are specialized in either protein generation or understanding, ProLLaMA aims to bridge this gap by leveraging a general Large Language Model (LLM) and a two-stage training framework. The first stage involves pre-training the LLM on protein language data, while the second stage tunes the model using a multi-task instruction dataset. To improve training efficiency, Protein Vocabulary Pruning (PVP) is introduced to reduce the vocabulary size. The model is evaluated on various tasks, demonstrating state-of-the-art performance in unconditional protein generation, controllable protein generation, and protein superfamily prediction. ProLLaMA's ability to handle multiple PLP tasks and its superior performance in generating structurally plausible proteins make it a significant advancement in the field of protein engineering and computational biology.
Reach us at info@study.space