Trimmomatic: a flexible trimmer for Illumina sequence data

Trimmomatic: a flexible trimmer for Illumina sequence data

April 1, 2014 | Anthony M. Bolger, Marc Lohse and Bjoern Usadel
Trimmomatic is a flexible and efficient preprocessing tool for Illumina sequence data, designed to handle paired-end data effectively. The authors developed Trimmomatic to address the limitations of existing tools in terms of flexibility, paired-end handling, and performance. The tool includes various processing steps for read trimming and filtering, with main algorithmic innovations in adapter sequence identification and quality filtering. Trimmomatic uses two modes for technical sequence removal: 'simple mode' and 'palindrome mode'. 'Simple mode' detects technical sequences by finding approximate matches between the read and the user-supplied technical sequence, while 'palindrome mode' is optimized for detecting 'adapter read-through' scenarios. 'Palindrome mode' can only be used with paired-end data and offers higher sensitivity and specificity compared to 'simple' mode. For quality filtering, Trimmomatic offers two approaches: 'sliding window' and 'maximum information'. The 'sliding window' approach removes the 3' end of the read when the average quality of a group of bases drops below a specified threshold. The 'maximum information' approach uses a combination of three factors—length threshold, coverage, and error rate—to determine the optimal trimming point. Trimmomatic is implemented as a pipeline-based tool, allowing individual steps to be applied to each read or read pair in the order specified by the user. It supports both standard and Illumina 'legacy' quality formats and can convert between them. The tool also supports multiple threads for improved performance. In reference-based alignment and de novo assembly scenarios, Trimmomatic outperformed other tools in terms of alignment rates and assembly quality. For reference-based alignment, Trimmomatic showed improved results in both tolerant and strict alignment settings. In de novo assembly, Trimmomatic significantly improved contig N50 and maximum contig size, demonstrating the importance of read preprocessing for accurate genome assembly. Trimmomatic was compared with existing tools such as AdapterRemoval, Scythe/Sickle, and EA-Utils, and was found to perform well in both tolerant and strict alignment settings. The tool's flexibility, efficiency, and ability to handle paired-end data make it a valuable tool for preprocessing Illumina sequence data.Trimmomatic is a flexible and efficient preprocessing tool for Illumina sequence data, designed to handle paired-end data effectively. The authors developed Trimmomatic to address the limitations of existing tools in terms of flexibility, paired-end handling, and performance. The tool includes various processing steps for read trimming and filtering, with main algorithmic innovations in adapter sequence identification and quality filtering. Trimmomatic uses two modes for technical sequence removal: 'simple mode' and 'palindrome mode'. 'Simple mode' detects technical sequences by finding approximate matches between the read and the user-supplied technical sequence, while 'palindrome mode' is optimized for detecting 'adapter read-through' scenarios. 'Palindrome mode' can only be used with paired-end data and offers higher sensitivity and specificity compared to 'simple' mode. For quality filtering, Trimmomatic offers two approaches: 'sliding window' and 'maximum information'. The 'sliding window' approach removes the 3' end of the read when the average quality of a group of bases drops below a specified threshold. The 'maximum information' approach uses a combination of three factors—length threshold, coverage, and error rate—to determine the optimal trimming point. Trimmomatic is implemented as a pipeline-based tool, allowing individual steps to be applied to each read or read pair in the order specified by the user. It supports both standard and Illumina 'legacy' quality formats and can convert between them. The tool also supports multiple threads for improved performance. In reference-based alignment and de novo assembly scenarios, Trimmomatic outperformed other tools in terms of alignment rates and assembly quality. For reference-based alignment, Trimmomatic showed improved results in both tolerant and strict alignment settings. In de novo assembly, Trimmomatic significantly improved contig N50 and maximum contig size, demonstrating the importance of read preprocessing for accurate genome assembly. Trimmomatic was compared with existing tools such as AdapterRemoval, Scythe/Sickle, and EA-Utils, and was found to perform well in both tolerant and strict alignment settings. The tool's flexibility, efficiency, and ability to handle paired-end data make it a valuable tool for preprocessing Illumina sequence data.
Reach us at info@study.space