A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells

A Knowledge Base for Predicting Protein Localization Sites in Eukaryotic Cells

1992 | KENTA NAKAI AND MINORU KANEHISA
Elsevier created a COVID-19 resource center in January 2020, offering free English and Mandarin information on the virus. The center is hosted on Elsevier Connect, and Elsevier grants permission to make all its research freely available in PubMed Central and other repositories for research reuse. The article presents an expert system for predicting protein localization sites in eukaryotic cells using if-then rules based on experimental and computational data. The system was tested on 401 proteins with known localization sites, achieving 66% accuracy on training data and 59% on testing data. The system uses sequence features and sorting signals to predict localization, including mitochondrial, nuclear, lysosomal, and other sites. The expert system uses discriminant analysis, hydrophobic moments, and other sequence features to identify sorting signals. It distinguishes between different types of sorting signals, such as M-transferons, P-transferons, and S-transferons. The system also incorporates knowledge of membrane topology and lipid anchors, including GPI anchors, palmitoylation, and isoprenylation. The system successfully predicts the localization of various proteins, including mitochondrial, nuclear, lysosomal, and vacuolar proteins. It also identifies signals for rapid internalization into endosomes and Golgi localization. The system's accuracy varies depending on the protein type and the complexity of the sorting signals. The expert system is flexible and can be applied to genome analysis. However, it faces challenges in predicting proteins with poorly characterized sorting signals, such as nuclei and lysosomes. The system's performance is evaluated based on prediction accuracy, with some proteins being falsely classified due to the lack of specific sorting signals. The system's rules are organized into a knowledge base, allowing for the integration of diverse sorting signals and the handling of ambiguous observations. The system's accuracy is compared to other standards, with a random guess resulting in less than 10% accuracy. The system's performance is considered better than other methods, particularly in predicting extracellular proteins and GPI-anchored proteins. The system's ability to handle complex sorting signals and its flexibility make it a valuable tool for protein localization prediction.Elsevier created a COVID-19 resource center in January 2020, offering free English and Mandarin information on the virus. The center is hosted on Elsevier Connect, and Elsevier grants permission to make all its research freely available in PubMed Central and other repositories for research reuse. The article presents an expert system for predicting protein localization sites in eukaryotic cells using if-then rules based on experimental and computational data. The system was tested on 401 proteins with known localization sites, achieving 66% accuracy on training data and 59% on testing data. The system uses sequence features and sorting signals to predict localization, including mitochondrial, nuclear, lysosomal, and other sites. The expert system uses discriminant analysis, hydrophobic moments, and other sequence features to identify sorting signals. It distinguishes between different types of sorting signals, such as M-transferons, P-transferons, and S-transferons. The system also incorporates knowledge of membrane topology and lipid anchors, including GPI anchors, palmitoylation, and isoprenylation. The system successfully predicts the localization of various proteins, including mitochondrial, nuclear, lysosomal, and vacuolar proteins. It also identifies signals for rapid internalization into endosomes and Golgi localization. The system's accuracy varies depending on the protein type and the complexity of the sorting signals. The expert system is flexible and can be applied to genome analysis. However, it faces challenges in predicting proteins with poorly characterized sorting signals, such as nuclei and lysosomes. The system's performance is evaluated based on prediction accuracy, with some proteins being falsely classified due to the lack of specific sorting signals. The system's rules are organized into a knowledge base, allowing for the integration of diverse sorting signals and the handling of ambiguous observations. The system's accuracy is compared to other standards, with a random guess resulting in less than 10% accuracy. The system's performance is considered better than other methods, particularly in predicting extracellular proteins and GPI-anchored proteins. The system's ability to handle complex sorting signals and its flexibility make it a valuable tool for protein localization prediction.
Reach us at info@study.space