1995 | James K. Bonfield, Kathryn F. Smith and Rodger Staden
The Genome Assembly Program (GAP) is a new DNA sequence assembly program developed by James K. Bonfield, Kathryn F. Smith, and Rodger Staden. It is suitable for both large and small projects, and can handle data from various sequencing instruments. The program retains useful components from previous work but includes many novel ideas and methods, particularly enabled by its new, highly interactive graphical user interface. The program provides visual clues about the current state of a sequencing project and allows users to interact with their data in intuitive and graphical ways. It includes tools for displaying and manipulating data that help solve and check difficult assemblies, especially in repetitive genomes. New displays include the Contig Selector, Contig Comparator, Template Display, Restriction Enzyme Map, and Stop Codon Map. The program also allows multiple Contig Editors and Contig Joining Editors to run simultaneously. It includes a new 'Directed Assembly' algorithm and routines for automatically detecting unfinished segments of sequence and suggesting experimental solutions.
The program is written in ANSI C and FORTRAN 77, with the user interface in Tcl and Tk, producing a Motif style 'look and feel'. It is used with X windows on Sun, DEC, and SGI UNIX workstations. The main design goal was to allow algorithms to produce results available to viewing methods under the user's control. The user can choose how to view data and control which items are visible and when they are deleted. The program uses several alignment algorithms.
The program handles data from sequencing instruments such as ABI 373A and 377, Pharmacia A.L.F., and LiCor. It can also use data entered using digitizers or typed by hand. Trace data files are converted to SCF format. The program calculates its own accuracy estimates from peak area ratios. Preassembly steps, including quality clipping, sequencing vector and cosmid vector removal, are controlled by the script PREGAP. The program includes facilities for labelling segments of readings and consensus sequences using 'tags'. Tags can be created, edited, and removed by users and internal routines. Tags can also be input along with readings.
Tags are used for various purposes, and the user can choose which tag types are currently 'active'. Where used for visual clues, this determines which types of tag appear in the displays. For other functions, tags can be used to control which parts of the sequence are omitted from processing. This mode of operation is known as 'masking'. The program includes a routine to search for repeats and, if any are found, the user needs to know if such sequence duplications are caused by incorrect assembly or are genuine repeats. Once the user has checked a duplication reported by the program and found it to be a genuine repeat, it can be labelled with a REPT tag. If the repeat search is run in masking mode and with REPT tags active, any segment covered by a REPT tag will not be reported as a match.The Genome Assembly Program (GAP) is a new DNA sequence assembly program developed by James K. Bonfield, Kathryn F. Smith, and Rodger Staden. It is suitable for both large and small projects, and can handle data from various sequencing instruments. The program retains useful components from previous work but includes many novel ideas and methods, particularly enabled by its new, highly interactive graphical user interface. The program provides visual clues about the current state of a sequencing project and allows users to interact with their data in intuitive and graphical ways. It includes tools for displaying and manipulating data that help solve and check difficult assemblies, especially in repetitive genomes. New displays include the Contig Selector, Contig Comparator, Template Display, Restriction Enzyme Map, and Stop Codon Map. The program also allows multiple Contig Editors and Contig Joining Editors to run simultaneously. It includes a new 'Directed Assembly' algorithm and routines for automatically detecting unfinished segments of sequence and suggesting experimental solutions.
The program is written in ANSI C and FORTRAN 77, with the user interface in Tcl and Tk, producing a Motif style 'look and feel'. It is used with X windows on Sun, DEC, and SGI UNIX workstations. The main design goal was to allow algorithms to produce results available to viewing methods under the user's control. The user can choose how to view data and control which items are visible and when they are deleted. The program uses several alignment algorithms.
The program handles data from sequencing instruments such as ABI 373A and 377, Pharmacia A.L.F., and LiCor. It can also use data entered using digitizers or typed by hand. Trace data files are converted to SCF format. The program calculates its own accuracy estimates from peak area ratios. Preassembly steps, including quality clipping, sequencing vector and cosmid vector removal, are controlled by the script PREGAP. The program includes facilities for labelling segments of readings and consensus sequences using 'tags'. Tags can be created, edited, and removed by users and internal routines. Tags can also be input along with readings.
Tags are used for various purposes, and the user can choose which tag types are currently 'active'. Where used for visual clues, this determines which types of tag appear in the displays. For other functions, tags can be used to control which parts of the sequence are omitted from processing. This mode of operation is known as 'masking'. The program includes a routine to search for repeats and, if any are found, the user needs to know if such sequence duplications are caused by incorrect assembly or are genuine repeats. Once the user has checked a duplication reported by the program and found it to be a genuine repeat, it can be labelled with a REPT tag. If the repeat search is run in masking mode and with REPT tags active, any segment covered by a REPT tag will not be reported as a match.