October 18, 2013 | Jiajie Zhang, Kassian Kobert, Tomáš Flouri, Alexandros Stamatakis
PEAR is a fast and accurate tool for merging Illumina paired-end reads from target fragments of varying lengths. It evaluates all possible overlaps between paired-end reads and does not require the target fragment size as input. PEAR also implements a statistical test to minimize false positives. It can merge millions of paired-end reads within minutes on a standard desktop computer and shows linear speedups on multi-core systems. PEAR is implemented in C and uses POSIX threads, and is freely available at http://www.exelixis-lab.org/web/software/pear.
The Illumina sequencing platform generates short reads, but paired-end reads can be merged to increase read length. However, existing tools often fail when fragment lengths vary. PEAR addresses this by maximizing the assembly score (AS) of the read overlap using a scoring matrix that penalizes mismatches and rewards matches. It considers quality scores and sequence matches, and does not require preprocessing or specifying fragment size. PEAR can reliably identify reads that can be merged or need to be discarded.
PEAR uses a statistical test based on observed expected alignment scores (OESs) to identify false-positive merged reads. On simulated data with a mean overlap of 20-bp, PEAR correctly merges 90.44% of the fragments with a FPR of 2.78% when the statistical test is disabled. When the significance level is set to 1%, it correctly merges 70.06% of the fragments with a FPR of 0.48%. PEAR outperforms other tools like PANDAseq and FLASH in terms of accuracy and FPR.
PEAR is efficient and scalable, with a parallel version that scales linearly with the number of cores. It is suitable for merging paired-end reads from datasets with varying DNA fragment sizes. PEAR is robust to short overlaps and does not require prior knowledge of read length or fragment size. It is also efficient in terms of memory usage and can be deployed on standard desktop and laptop computers as well as high-end multi-core servers.
PEAR has been tested on simulated and empirical datasets, including the Staphylococcus aureus genome and a dataset of paired-end reads from a known single sequence. It consistently produces accurate merged reads with low FPRs. PEAR outperforms other tools in terms of accuracy and FPR, and is suitable for a wide range of applications in genomics and metagenomics.PEAR is a fast and accurate tool for merging Illumina paired-end reads from target fragments of varying lengths. It evaluates all possible overlaps between paired-end reads and does not require the target fragment size as input. PEAR also implements a statistical test to minimize false positives. It can merge millions of paired-end reads within minutes on a standard desktop computer and shows linear speedups on multi-core systems. PEAR is implemented in C and uses POSIX threads, and is freely available at http://www.exelixis-lab.org/web/software/pear.
The Illumina sequencing platform generates short reads, but paired-end reads can be merged to increase read length. However, existing tools often fail when fragment lengths vary. PEAR addresses this by maximizing the assembly score (AS) of the read overlap using a scoring matrix that penalizes mismatches and rewards matches. It considers quality scores and sequence matches, and does not require preprocessing or specifying fragment size. PEAR can reliably identify reads that can be merged or need to be discarded.
PEAR uses a statistical test based on observed expected alignment scores (OESs) to identify false-positive merged reads. On simulated data with a mean overlap of 20-bp, PEAR correctly merges 90.44% of the fragments with a FPR of 2.78% when the statistical test is disabled. When the significance level is set to 1%, it correctly merges 70.06% of the fragments with a FPR of 0.48%. PEAR outperforms other tools like PANDAseq and FLASH in terms of accuracy and FPR.
PEAR is efficient and scalable, with a parallel version that scales linearly with the number of cores. It is suitable for merging paired-end reads from datasets with varying DNA fragment sizes. PEAR is robust to short overlaps and does not require prior knowledge of read length or fragment size. It is also efficient in terms of memory usage and can be deployed on standard desktop and laptop computers as well as high-end multi-core servers.
PEAR has been tested on simulated and empirical datasets, including the Staphylococcus aureus genome and a dataset of paired-end reads from a known single sequence. It consistently produces accurate merged reads with low FPRs. PEAR outperforms other tools in terms of accuracy and FPR, and is suitable for a wide range of applications in genomics and metagenomics.