| J. Dauparas, I. Anishchenko, N. Bennett, H. Bai, R. J. Ragotte, L. F. Milles, B. I. M. Wicky, A. Courbet, R. J. de Haas, N. Bethel, P. J. Y. Leung, T. F. Huddy, S. Pellock, D. Tischer, F. Chan, B. Koepnick, H. Nguyen, A. Kang, B. Sankaran, A. K. Bera, N. P. King, D. Baker
This supplementary material provides detailed methods and supplementary figures for the robust deep learning-based protein sequence design using ProteinMPNN. The methods section covers the training of single-chain and multi-chain models, including data preparation, architecture modifications, loss functions, and optimization techniques. The multi-chain models were trained on protein assemblies from the PDB, with sequences clustered at a 30% identity cutoff. The loss function used negative log likelihood with label smoothing, and the optimization used Adam with specific hyperparameters. The input features for ProteinMPNN included embedded edges and relative positional encodings. The model architecture consists of encoder-decoder message passing neural networks. Supplementary figures (S1 to S9) provide additional validation results, compositional bias comparisons, and benchmarks against AlphaFold. The material also includes experimental methods for protein expression, crystallization, and structure validation.This supplementary material provides detailed methods and supplementary figures for the robust deep learning-based protein sequence design using ProteinMPNN. The methods section covers the training of single-chain and multi-chain models, including data preparation, architecture modifications, loss functions, and optimization techniques. The multi-chain models were trained on protein assemblies from the PDB, with sequences clustered at a 30% identity cutoff. The loss function used negative log likelihood with label smoothing, and the optimization used Adam with specific hyperparameters. The input features for ProteinMPNN included embedded edges and relative positional encodings. The model architecture consists of encoder-decoder message passing neural networks. Supplementary figures (S1 to S9) provide additional validation results, compositional bias comparisons, and benchmarks against AlphaFold. The material also includes experimental methods for protein expression, crystallization, and structure validation.