2016 July | Benjamin J Callahan, Paul J McMurdie, Michael J Rosen, Andrew W Han, Amy Jo A Johnson, and Susan P Holmes
DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors, allowing for exact inference of sample sequences without coarse-graining into OTUs. It resolves differences as small as one nucleotide and outperforms other methods in identifying real variants and reducing spurious sequences. DADA2 was tested on three mock communities and applied to vaginal samples from pregnant women, revealing previously undetected Lactobacillus crispatus variants.
Microbial communities are crucial for human and environmental health, and amplicon sequencing of marker genes like 16S rRNA provides a census of communities. However, distinguishing biological variation from sequencing errors is challenging. DADA2 improves error correction by using a quality-aware model of Illumina amplicon errors and inferring sample composition by dividing reads into partitions consistent with the error model. It is reference-free and applicable to any genetic locus, and implements the full amplicon workflow: filtering, dereplication, chimera identification, and merging paired-end reads.
DADA2 was compared to four algorithms: UPARSE, MED, mothur, and QIIME. It identified more reference sequences and strains than other methods, with fewer false positives. It also resolved fine-scale variation better than the current best method for that task, while outputting fewer incorrect sequences than the most robust OTU method. DADA2's precision improves downstream measures of diversity and dissimilarity, allowing amplicon methods to probe strain-level variation.
DADA2's core denoising algorithm is slower but comparable to UPARSE, and it processes Illumina samples efficiently on a laptop. It was evaluated on two longitudinal datasets: 142 vaginal samples from 42 pregnant women and 360 mouse fecal samples. DADA2 revealed that Lactobacillus crispatus communities are more complex than generally recognized, with six distinct sequence variants present in multiple samples. These variants are imperceptible to OTU methods, as they differ by just 1–2 nucleotides.
DADA2 identified more biological variants, especially within UPARSE's OTU radius, while outputting fewer spurious sequences. It also identified more variants than other methods, with high sample richness correlation. DADA2's output can be clustered into OTUs, but this often eliminates biological information. DADA2 enhances microbial community studies by allowing accurate reconstruction of amplicon-sequenced communities at the highest resolution.DADA2 is a software package that models and corrects Illumina-sequenced amplicon errors, allowing for exact inference of sample sequences without coarse-graining into OTUs. It resolves differences as small as one nucleotide and outperforms other methods in identifying real variants and reducing spurious sequences. DADA2 was tested on three mock communities and applied to vaginal samples from pregnant women, revealing previously undetected Lactobacillus crispatus variants.
Microbial communities are crucial for human and environmental health, and amplicon sequencing of marker genes like 16S rRNA provides a census of communities. However, distinguishing biological variation from sequencing errors is challenging. DADA2 improves error correction by using a quality-aware model of Illumina amplicon errors and inferring sample composition by dividing reads into partitions consistent with the error model. It is reference-free and applicable to any genetic locus, and implements the full amplicon workflow: filtering, dereplication, chimera identification, and merging paired-end reads.
DADA2 was compared to four algorithms: UPARSE, MED, mothur, and QIIME. It identified more reference sequences and strains than other methods, with fewer false positives. It also resolved fine-scale variation better than the current best method for that task, while outputting fewer incorrect sequences than the most robust OTU method. DADA2's precision improves downstream measures of diversity and dissimilarity, allowing amplicon methods to probe strain-level variation.
DADA2's core denoising algorithm is slower but comparable to UPARSE, and it processes Illumina samples efficiently on a laptop. It was evaluated on two longitudinal datasets: 142 vaginal samples from 42 pregnant women and 360 mouse fecal samples. DADA2 revealed that Lactobacillus crispatus communities are more complex than generally recognized, with six distinct sequence variants present in multiple samples. These variants are imperceptible to OTU methods, as they differ by just 1–2 nucleotides.
DADA2 identified more biological variants, especially within UPARSE's OTU radius, while outputting fewer spurious sequences. It also identified more variants than other methods, with high sample richness correlation. DADA2's output can be clustered into OTUs, but this often eliminates biological information. DADA2 enhances microbial community studies by allowing accurate reconstruction of amplicon-sequenced communities at the highest resolution.