June 25, 2010 | Aaron E. Darling¹, Bob Mau², Nicole T. Perna³
progressiveMauve is a multiple genome alignment method that accounts for gene gain, loss, and rearrangement. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score to accurately detect rearrangement breakpoints when genomes have unequal gene content. It also applies a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences. The method is implemented in a program called progressiveMauve, part of the Mauve genome alignment package versions 2.0 and later. The method was tested on a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows the extension of the previously defined concepts of core- and pan-genomes to include not only annotated genes but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. The method demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. The method is compared to existing alignment methods on datasets simulated to encompass a broad range of genomic mutation types and rates, including inversion, gene gain, loss, and duplication. The method is also used to evaluate alignment accuracy using new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The method is able to accurately align regions conserved in some, but not all, of the genomes, an important case not handled by previous work. The method is also able to handle the alignment of regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions withprogressiveMauve is a multiple genome alignment method that accounts for gene gain, loss, and rearrangement. The method uses a novel alignment objective score called a sum-of-pairs breakpoint score to accurately detect rearrangement breakpoints when genomes have unequal gene content. It also applies a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences. The method is implemented in a program called progressiveMauve, part of the Mauve genome alignment package versions 2.0 and later. The method was tested on a set of 23 genomes from the genera Escherichia, Shigella, and Salmonella. Analysis of whole-genome multiple alignments allows the extension of the previously defined concepts of core- and pan-genomes to include not only annotated genes but also non-coding regions with potential regulatory roles. The 23 enterobacteria have an estimated core-genome of 2.46Mbp conserved among all taxa and a pan-genome of 15.2Mbp. The method demonstrates high accuracy in situations where genomes have undergone biologically feasible amounts of genome rearrangement, segmental gain and loss. The method is compared to existing alignment methods on datasets simulated to encompass a broad range of genomic mutation types and rates, including inversion, gene gain, loss, and duplication. The method is also used to evaluate alignment accuracy using new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The method is able to accurately align regions conserved in some, but not all, of the genomes, an important case not handled by previous work. The method is also able to handle the alignment of regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with differential gene content and rearrangement. The method is able to accurately align regions with