Prediction of glycopeptide fragment mass spectra by deep learning

Prediction of glycopeptide fragment mass spectra by deep learning

19 March 2024 | Yi Yang & Qun Fang
DeepGlyco is a deep learning model designed to predict the fragment mass spectra of intact glycopeptides. The model uses tree-structured long-short term memory (LSTM) networks to process the glycan moiety and graph neural networks to incorporate potential fragmentation pathways of specific glycan structures. This approach enhances model explainability and the ability to differentiate glycan structural isomers. Predicted spectral libraries can be used for data-independent acquisition (DIA) glycoproteomics to supplement library completeness. LC-MS/MS is widely used in proteomics and glycoproteomics. Peptide identification relies on matching fragment spectra to theoretical or experimental spectra. Spectral library searching, which considers fragment ion intensity patterns, has been shown to yield more discriminative match scores than database searching. Spectral libraries are also used in DIA experiments, providing deep proteome coverage with quantitative consistency. Deep learning has become increasingly prevalent in proteomics, with methods like pDeep, DeepMass:Prism, Prosit, and AlphaPeptDeep representing the state-of-the-art for various tasks. Fragment spectrum prediction has improved DDA-based peptide identification by integrating intensity information into PSM scoring. DIA data analysis has also benefited from peptide fragment spectrum prediction. Predicted spectral libraries can be generated directly from protein sequence databases, enabling deep proteome coverage without experimental spectral libraries. Deep learning models have been specialized for specific post-translational modifications, such as DeepPhospho for DIA phosphoproteomics and DeepFLR for phosphorylation site localization. However, current methods fail to predict fragment spectra of intact glycopeptides. Intact glycopeptides maintain the peptide-glycan link, providing information on the peptide sequence, linked glycan structures, and glycosite. The glycan moiety is a complex structure composed of different monosaccharides and variable linkages. Existing tools for peptide property prediction use LSTM, gated recurrent unit, or transformer-based models, which can only process linear input of peptide sequences. These models do not cope with the glycan structure. Fragmentation behaviors of intact glycopeptides in MS/MS differ from non-glycosylated peptides, resulting in merged spectra containing both peptide and glycan fragments. DeepGlyco uses a deep learning framework to predict MS/MS spectra of intact glycopeptides. The input peptide sequences are processed by conventional LSTM networks, while the glycan structures are resolved by introducing tree LSTM networks. Graph neural networks with attention mechanisms model potential fragmentation pathways of structure-specific glycans, enabling the explanation of possible origins of predicted fragment ions. This feature is beneficial to differentiating glycan structural isomers. Predicted spectral libraries are suitable for analyzing DIA data of glycopeptides as a supplement for library completeness. The model was trained and validated with datasets from diverse organisms acquired on Orbitrap mass spectrometers with distinct instrument settings. The spectral angle loss (SA) was used as the objective function for spectrum predictionDeepGlyco is a deep learning model designed to predict the fragment mass spectra of intact glycopeptides. The model uses tree-structured long-short term memory (LSTM) networks to process the glycan moiety and graph neural networks to incorporate potential fragmentation pathways of specific glycan structures. This approach enhances model explainability and the ability to differentiate glycan structural isomers. Predicted spectral libraries can be used for data-independent acquisition (DIA) glycoproteomics to supplement library completeness. LC-MS/MS is widely used in proteomics and glycoproteomics. Peptide identification relies on matching fragment spectra to theoretical or experimental spectra. Spectral library searching, which considers fragment ion intensity patterns, has been shown to yield more discriminative match scores than database searching. Spectral libraries are also used in DIA experiments, providing deep proteome coverage with quantitative consistency. Deep learning has become increasingly prevalent in proteomics, with methods like pDeep, DeepMass:Prism, Prosit, and AlphaPeptDeep representing the state-of-the-art for various tasks. Fragment spectrum prediction has improved DDA-based peptide identification by integrating intensity information into PSM scoring. DIA data analysis has also benefited from peptide fragment spectrum prediction. Predicted spectral libraries can be generated directly from protein sequence databases, enabling deep proteome coverage without experimental spectral libraries. Deep learning models have been specialized for specific post-translational modifications, such as DeepPhospho for DIA phosphoproteomics and DeepFLR for phosphorylation site localization. However, current methods fail to predict fragment spectra of intact glycopeptides. Intact glycopeptides maintain the peptide-glycan link, providing information on the peptide sequence, linked glycan structures, and glycosite. The glycan moiety is a complex structure composed of different monosaccharides and variable linkages. Existing tools for peptide property prediction use LSTM, gated recurrent unit, or transformer-based models, which can only process linear input of peptide sequences. These models do not cope with the glycan structure. Fragmentation behaviors of intact glycopeptides in MS/MS differ from non-glycosylated peptides, resulting in merged spectra containing both peptide and glycan fragments. DeepGlyco uses a deep learning framework to predict MS/MS spectra of intact glycopeptides. The input peptide sequences are processed by conventional LSTM networks, while the glycan structures are resolved by introducing tree LSTM networks. Graph neural networks with attention mechanisms model potential fragmentation pathways of structure-specific glycans, enabling the explanation of possible origins of predicted fragment ions. This feature is beneficial to differentiating glycan structural isomers. Predicted spectral libraries are suitable for analyzing DIA data of glycopeptides as a supplement for library completeness. The model was trained and validated with datasets from diverse organisms acquired on Orbitrap mass spectrometers with distinct instrument settings. The spectral angle loss (SA) was used as the objective function for spectrum prediction
Reach us at info@study.space