Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites

Comprehensive modeling of microRNA targets predicts functional non-conserved and non-canonical sites

2010 | Doron Betel, Anjali Koppal, Phaedra Agius, Chris Sander, Christina Leslie
A new machine learning method, mirSVR, is introduced for ranking microRNA target sites based on down-regulation scores. The algorithm uses a regression model trained on sequence and contextual features from miRanda-predicted target sites. In large-scale evaluations, miRanda-mirSVR is competitive with other target prediction methods in identifying target genes and predicting downregulation at the mRNA or protein levels. Importantly, it identifies many experimentally validated non-canonical and non-conserved sites. MicroRNAs regulate gene expression by binding to complementary sequences in the 3' UTR of target mRNAs. While perfect seed complementarity is a primary determinant of target specificity, non-canonical sites with mismatches or G:U wobbles can also be regulatory. Most computational methods require canonical sites, but mirSVR incorporates both canonical and non-canonical sites into a single model without defining seed subclasses. It uses support vector regression (SVR) to train on features like secondary structure accessibility and conservation. mirSVR was tested against existing methods using microRNA transfection and inhibition experiments. It performed as well as or better than existing methods in predicting downregulation. The model correctly identified functional but poorly conserved sites and showed that conservation filters reduce true target detection. mirSVR scores are calibrated to correlate linearly with downregulation, enabling accurate scoring of genes with multiple target sites. Scores can be interpreted as empirical probabilities of downregulation, aiding in selecting cutoffs. mirSVR outperformed context scores in most test sets, indicating better prediction of downregulation. It also detected genes regulated by multiple endogenous microRNAs, not just transfected ones. The model showed that seed classes have broad efficiency ranges, and non-canonical sites can be effectively included without increasing false positives. mirSVR scores were validated on non-canonical sites identified by PAR-CLIP experiments, showing significant discrimination between true and false sites. mirSVR's performance was evaluated on various data sets, including AGO IP and CLIP experiments. It demonstrated that non-canonical sites are important for microRNA regulation and that conservation should be used as a feature, not a filter. The model's improved performance is attributed to its ability to handle variability in seed region binding, incorporate diverse features, and avoid overfitting. Future work should consider additional experimental data to improve target prediction.A new machine learning method, mirSVR, is introduced for ranking microRNA target sites based on down-regulation scores. The algorithm uses a regression model trained on sequence and contextual features from miRanda-predicted target sites. In large-scale evaluations, miRanda-mirSVR is competitive with other target prediction methods in identifying target genes and predicting downregulation at the mRNA or protein levels. Importantly, it identifies many experimentally validated non-canonical and non-conserved sites. MicroRNAs regulate gene expression by binding to complementary sequences in the 3' UTR of target mRNAs. While perfect seed complementarity is a primary determinant of target specificity, non-canonical sites with mismatches or G:U wobbles can also be regulatory. Most computational methods require canonical sites, but mirSVR incorporates both canonical and non-canonical sites into a single model without defining seed subclasses. It uses support vector regression (SVR) to train on features like secondary structure accessibility and conservation. mirSVR was tested against existing methods using microRNA transfection and inhibition experiments. It performed as well as or better than existing methods in predicting downregulation. The model correctly identified functional but poorly conserved sites and showed that conservation filters reduce true target detection. mirSVR scores are calibrated to correlate linearly with downregulation, enabling accurate scoring of genes with multiple target sites. Scores can be interpreted as empirical probabilities of downregulation, aiding in selecting cutoffs. mirSVR outperformed context scores in most test sets, indicating better prediction of downregulation. It also detected genes regulated by multiple endogenous microRNAs, not just transfected ones. The model showed that seed classes have broad efficiency ranges, and non-canonical sites can be effectively included without increasing false positives. mirSVR scores were validated on non-canonical sites identified by PAR-CLIP experiments, showing significant discrimination between true and false sites. mirSVR's performance was evaluated on various data sets, including AGO IP and CLIP experiments. It demonstrated that non-canonical sites are important for microRNA regulation and that conservation should be used as a feature, not a filter. The model's improved performance is attributed to its ability to handle variability in seed region binding, incorporate diverse features, and avoid overfitting. Future work should consider additional experimental data to improve target prediction.
Reach us at info@study.space