Understanding Feature Reuse and Scaling%3A Understanding Transfer Learning with Protein Language Models

This paper investigates the effectiveness of transfer learning using protein language models (PLMs) for various downstream tasks, such as protein structure and property prediction. The authors conduct 370 experiments across different tasks, architectures, model sizes, and pretraining times to understand how pretraining features relate to downstream performance. They find that while PLMs generally improve performance on most tasks compared to naive sequence representations, this improvement does not scale with pretraining. Instead, performance relies on low-level features learned early in pretraining. The study highlights a mismatch between current PLM pretraining paradigms and most applications, suggesting the need for better pretraining methods. The results also indicate that secondary structure prediction is the only task where pretraining and downstream tasks are well-aligned, while other tasks benefit from transfer learning but do not scale with pretraining improvements. The paper concludes by emphasizing the need for improved evaluation standards and new pretraining tasks to enhance the generalizability of PLMs in protein biology and engineering.This paper investigates the effectiveness of transfer learning using protein language models (PLMs) for various downstream tasks, such as protein structure and property prediction. The authors conduct 370 experiments across different tasks, architectures, model sizes, and pretraining times to understand how pretraining features relate to downstream performance. They find that while PLMs generally improve performance on most tasks compared to naive sequence representations, this improvement does not scale with pretraining. Instead, performance relies on low-level features learned early in pretraining. The study highlights a mismatch between current PLM pretraining paradigms and most applications, suggesting the need for better pretraining methods. The results also indicate that secondary structure prediction is the only task where pretraining and downstream tasks are well-aligned, while other tasks benefit from transfer learning but do not scale with pretraining improvements. The paper concludes by emphasizing the need for improved evaluation standards and new pretraining tasks to enhance the generalizability of PLMs in protein biology and engineering.

Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models

February 14, 2024 | Francesca-Zhoufan Li, Ava P. Amini, Yisong Yue, Kevin K. Yang, Alex X. Lu