Bracken: Estimating species abundance in metagenomics data

Bracken: Estimating species abundance in metagenomics data

| Jennifer Lu¹,²,†, Florian P. Breitwieser², Peter Thielen³, and Steven L. Salzberg¹,²,⁴
Bracken is a Bayesian method for estimating species abundance in metagenomics data. It uses Kraken's taxonomy labels to estimate the number of reads from each species in a sample. Kraken classifies reads to the best matching taxonomic location but does not estimate abundances. Bracken uses the Kraken database to derive probabilities of sequence sharing between genomes and combines this with sample assignments to estimate abundance at species, genus, or higher levels. Bracken improves accuracy by re-assigning reads from higher taxonomic nodes to the species level, even when species are nearly identical or have been reclassified. Metagenomics has advanced due to rapid, inexpensive sequencing. Many bacterial species are nearly identical, and taxonomy is constantly revised. Kraken assigns reads to the lowest common ancestor (LCA) of matching species, which can lead to inaccurate abundance estimates. Bracken addresses this by probabilistically re-assigning reads from higher taxonomic nodes to the species level, improving accuracy. Bracken uses Bayesian inference to estimate species abundance by re-distributing reads in the taxonomic tree. It calculates probabilities based on Kraken's classifications and the similarity between genomes. For each genome, it estimates the proportion of reads that belong to it and uses this to calculate species abundance. Bracken can also estimate abundance at higher taxonomic levels. Experiments on simulated and real metagenomics data show that Bracken provides accurate abundance estimates. For the i100 dataset, Bracken estimates species abundance with 98% accuracy. For the skin microbiome experiment, Bracken estimates genus abundance with high accuracy. Bracken is available as open-source software and is freely accessible.Bracken is a Bayesian method for estimating species abundance in metagenomics data. It uses Kraken's taxonomy labels to estimate the number of reads from each species in a sample. Kraken classifies reads to the best matching taxonomic location but does not estimate abundances. Bracken uses the Kraken database to derive probabilities of sequence sharing between genomes and combines this with sample assignments to estimate abundance at species, genus, or higher levels. Bracken improves accuracy by re-assigning reads from higher taxonomic nodes to the species level, even when species are nearly identical or have been reclassified. Metagenomics has advanced due to rapid, inexpensive sequencing. Many bacterial species are nearly identical, and taxonomy is constantly revised. Kraken assigns reads to the lowest common ancestor (LCA) of matching species, which can lead to inaccurate abundance estimates. Bracken addresses this by probabilistically re-assigning reads from higher taxonomic nodes to the species level, improving accuracy. Bracken uses Bayesian inference to estimate species abundance by re-distributing reads in the taxonomic tree. It calculates probabilities based on Kraken's classifications and the similarity between genomes. For each genome, it estimates the proportion of reads that belong to it and uses this to calculate species abundance. Bracken can also estimate abundance at higher taxonomic levels. Experiments on simulated and real metagenomics data show that Bracken provides accurate abundance estimates. For the i100 dataset, Bracken estimates species abundance with 98% accuracy. For the skin microbiome experiment, Bracken estimates genus abundance with high accuracy. Bracken is available as open-source software and is freely accessible.
Reach us at info@futurestudyspace.com
Understanding Bracken%3A estimating species abundance in metagenomics data