October 2012 | Volume 7 | Issue 10 | e46688 | Yongwook Choi, Gregory E. Sims*a, Sean Murphy*a, Jason R. Miller, Agnes P. Chan*
This article introduces PROVEAN, a new algorithm for predicting the functional effects of protein sequence variations, including single amino acid substitutions, in-frame insertions, and deletions. The method uses an alignment-based score to measure the change in sequence similarity between a query sequence and a homologous protein sequence before and after an amino acid variation. The score is calculated based on the difference in semi-global alignment scores between the query sequence and the homologous sequence. The algorithm was tested on human and non-human protein variations from the UniProtKB/Swiss-Prot database and showed high accuracy in distinguishing disease-associated variants from common polymorphisms. The area under the receiver operating characteristic curve (AUC) for human and non-human protein variation datasets was approximately 0.85. The PROVEAN score correlates with the deleteriousness of a sequence variation and can be used as an indicator of the functional impact of a protein variation. The algorithm was also validated using experimental datasets from mutagenesis experiments on the human tumor suppressor protein TP53 and the ATP-binding cassette transporter 1 protein ABCA1. The results showed that PROVEAN performs well in predicting the functional effects of protein sequence variations, including multiple amino acid substitutions and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.This article introduces PROVEAN, a new algorithm for predicting the functional effects of protein sequence variations, including single amino acid substitutions, in-frame insertions, and deletions. The method uses an alignment-based score to measure the change in sequence similarity between a query sequence and a homologous protein sequence before and after an amino acid variation. The score is calculated based on the difference in semi-global alignment scores between the query sequence and the homologous sequence. The algorithm was tested on human and non-human protein variations from the UniProtKB/Swiss-Prot database and showed high accuracy in distinguishing disease-associated variants from common polymorphisms. The area under the receiver operating characteristic curve (AUC) for human and non-human protein variation datasets was approximately 0.85. The PROVEAN score correlates with the deleteriousness of a sequence variation and can be used as an indicator of the functional impact of a protein variation. The algorithm was also validated using experimental datasets from mutagenesis experiments on the human tumor suppressor protein TP53 and the ATP-binding cassette transporter 1 protein ABCA1. The results showed that PROVEAN performs well in predicting the functional effects of protein sequence variations, including multiple amino acid substitutions and in-frame insertions and deletions. The PROVEAN tool is available online at http://provean.jcvi.org.