miRBase: annotating high confidence microRNAs using deep sequencing data

miRBase: annotating high confidence microRNAs using deep sequencing data

2014 | Ana Kozomara and Sam Griffiths-Jones
The miRBase database has been updated to improve the quality of microRNA (miRNA) sequence data. The latest release (v20, June 2013) includes 24,521 miRNA loci from 206 species, producing 30,424 mature miRNA products. With the increasing number of novel miRNAs discovered through small RNA deep sequencing, maintaining data quality has become a challenge. To address this, miRBase now provides a high confidence subset of miRNA entries based on read mapping patterns. This subset is available alongside the complete miRNA collection. Additionally, miRBase has embedded Wikipedia pages for microRNAs to encourage community contributions of textual and functional information. The database now uses deep sequencing data to assign confidence levels to miRNA annotations. High confidence entries must meet specific criteria, such as having sufficient reads mapping to both mature miRNA strands and consistent processing patterns. These criteria help distinguish true miRNAs from other RNA fragments. For example, mmu-mir-3072 is annotated as high confidence due to consistent read patterns, while mmu-mir-1940 is not, as its reads do not pair with the expected 2-nt 3' overhang. The high confidence set includes 1,761 loci, representing 22% of miRNAs in 38 species. However, many miRNAs lack sufficient read evidence, leading to lower confidence annotations. Some well-established miRNAs, like hsa-mir-126, may not meet the criteria due to variable 5' ends but are still considered valid. The database allows users to manually promote specific miRNAs into the high confidence set. Community contributions are encouraged through embedded Wikipedia pages, which allow users to edit and improve information. Over 4,800 miRBase entries currently link to Wikipedia pages, representing 20% of the database. This system helps gather functional information that is often missing from miRBase entries. Future developments include using existing prediction tools to score annotations and allowing searches based on user-defined confidence thresholds. The high confidence set is expected to become the default view as more data is collected. Lower confidence annotations will remain available but will be appropriately tagged. Non-canonical miRNAs, such as those processed through alternative pathways, may not meet the criteria and are under-represented in the high confidence set. miRBase is freely available under the Creative Commons Zero license, with all data accessible via the website and FTP. The database welcomes feedback and requests for name assignments. The authors acknowledge the contributions of colleagues and funders, and there are no conflicts of interest.The miRBase database has been updated to improve the quality of microRNA (miRNA) sequence data. The latest release (v20, June 2013) includes 24,521 miRNA loci from 206 species, producing 30,424 mature miRNA products. With the increasing number of novel miRNAs discovered through small RNA deep sequencing, maintaining data quality has become a challenge. To address this, miRBase now provides a high confidence subset of miRNA entries based on read mapping patterns. This subset is available alongside the complete miRNA collection. Additionally, miRBase has embedded Wikipedia pages for microRNAs to encourage community contributions of textual and functional information. The database now uses deep sequencing data to assign confidence levels to miRNA annotations. High confidence entries must meet specific criteria, such as having sufficient reads mapping to both mature miRNA strands and consistent processing patterns. These criteria help distinguish true miRNAs from other RNA fragments. For example, mmu-mir-3072 is annotated as high confidence due to consistent read patterns, while mmu-mir-1940 is not, as its reads do not pair with the expected 2-nt 3' overhang. The high confidence set includes 1,761 loci, representing 22% of miRNAs in 38 species. However, many miRNAs lack sufficient read evidence, leading to lower confidence annotations. Some well-established miRNAs, like hsa-mir-126, may not meet the criteria due to variable 5' ends but are still considered valid. The database allows users to manually promote specific miRNAs into the high confidence set. Community contributions are encouraged through embedded Wikipedia pages, which allow users to edit and improve information. Over 4,800 miRBase entries currently link to Wikipedia pages, representing 20% of the database. This system helps gather functional information that is often missing from miRBase entries. Future developments include using existing prediction tools to score annotations and allowing searches based on user-defined confidence thresholds. The high confidence set is expected to become the default view as more data is collected. Lower confidence annotations will remain available but will be appropriately tagged. Non-canonical miRNAs, such as those processed through alternative pathways, may not meet the criteria and are under-represented in the high confidence set. miRBase is freely available under the Creative Commons Zero license, with all data accessible via the website and FTP. The database welcomes feedback and requests for name assignments. The authors acknowledge the contributions of colleagues and funders, and there are no conflicts of interest.
Reach us at info@study.space