GapFiller is an automated strategy for closing gaps in genome assemblies using paired reads. It effectively closes gaps in both bacterial and eukaryotic datasets with minimal errors. The method uses k-mer overlap to extend contigs from both ends of a gap, and iteratively fills gaps until no further gaps can be closed. GapFiller outperforms existing methods like IMAGE and SOAPdenovo in terms of accuracy and reliability, particularly in reducing the number of errors. It requires minimal computational resources and is user-friendly, making it suitable for a wide audience. The software is available at http://www.baseclear.com/bioinformatics-tools/.
GapFiller was tested on various bacterial and eukaryotic datasets, including Escherichia coli, Streptomyces coelicolor, and Saccharomyces cerevisiae. It significantly reduced the number of gaps and errors in these datasets. For human chromosome 14, GapFiller also showed effective gap closure, adding functional nucleotides to the assembly. The method is particularly useful for closing difficult regions, such as repetitive or low-coverage areas, which are often missed in draft assemblies.
The study highlights the importance of accurate gap closure in genome assemblies, as incomplete regions can lead to significant loss of nucleotides, especially in larger eukaryotes. Manual closure with Sanger sequencing is expensive and time-consuming, making automated methods like GapFiller essential. The software is designed to be accessible and efficient, with low memory usage and a user-friendly interface.
GapFiller's advantages include considering the estimated gap size before closure, extending contigs through k-mer overlap rather than local assembly, and re-evaluating contig edges during gap closure. These features make it more accurate and efficient than other methods. The results show that GapFiller can reliably close most gaps with minimal errors, making it a valuable tool for genome assembly. The software is available for download and is free for academic use.GapFiller is an automated strategy for closing gaps in genome assemblies using paired reads. It effectively closes gaps in both bacterial and eukaryotic datasets with minimal errors. The method uses k-mer overlap to extend contigs from both ends of a gap, and iteratively fills gaps until no further gaps can be closed. GapFiller outperforms existing methods like IMAGE and SOAPdenovo in terms of accuracy and reliability, particularly in reducing the number of errors. It requires minimal computational resources and is user-friendly, making it suitable for a wide audience. The software is available at http://www.baseclear.com/bioinformatics-tools/.
GapFiller was tested on various bacterial and eukaryotic datasets, including Escherichia coli, Streptomyces coelicolor, and Saccharomyces cerevisiae. It significantly reduced the number of gaps and errors in these datasets. For human chromosome 14, GapFiller also showed effective gap closure, adding functional nucleotides to the assembly. The method is particularly useful for closing difficult regions, such as repetitive or low-coverage areas, which are often missed in draft assemblies.
The study highlights the importance of accurate gap closure in genome assemblies, as incomplete regions can lead to significant loss of nucleotides, especially in larger eukaryotes. Manual closure with Sanger sequencing is expensive and time-consuming, making automated methods like GapFiller essential. The software is designed to be accessible and efficient, with low memory usage and a user-friendly interface.
GapFiller's advantages include considering the estimated gap size before closure, extending contigs through k-mer overlap rather than local assembly, and re-evaluating contig edges during gap closure. These features make it more accurate and efficient than other methods. The results show that GapFiller can reliably close most gaps with minimal errors, making it a valuable tool for genome assembly. The software is available for download and is free for academic use.