June 3, 2016 | Chen-Shan Chin, Paul Peluso, Fritz J. Sedlacek, Maria Nattestad, Gregory T. Concepcion, Alicia Clum, Christopher Dunn, Ronan O'Malley, Rosa Figueroa-Balderas, Abraham Morales-Cruz, Grant R. Cramer, Massimo Delledonne, Chongyuan Luo, Joseph R. Ecker, Dario Cantu, David R. Rank, Michael C. Schatz
The paper introduces FALCON and FALCON-Unzip, open-source algorithms designed to assemble Single Molecule Real-Time (SMRT®) Sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. These algorithms address the challenge of assembling non-inbred or rearranged heterozygous genomes, which are common in many biological and agricultural studies. The authors demonstrate the effectiveness of FALCON and FALCON-Unzip by assembling reference sequences for three heterozygous samples: *Arabidopsis thaliana*, *Vitis vinifera* cv. Cabernet Sauvignon, and *Clavicorona pyxidata*. The assemblies produced by FALCON were significantly more contiguous and complete than those produced by short or long-read approaches. The phased diploid assemblies enabled the study of haplotype structures and heterozygosities between homologous chromosomes, including identifying widespread heterozygous structural variations within coding sequences. The results show that FALCON and FALCON-Unzip can capture almost all heterozygosity information in the primary contigs and haplotigs, allowing for detailed analysis of haplotype-specific variations and their impact on gene expression, methylation patterns, and regulatory interactions.The paper introduces FALCON and FALCON-Unzip, open-source algorithms designed to assemble Single Molecule Real-Time (SMRT®) Sequencing data into highly accurate, contiguous, and correctly phased diploid genomes. These algorithms address the challenge of assembling non-inbred or rearranged heterozygous genomes, which are common in many biological and agricultural studies. The authors demonstrate the effectiveness of FALCON and FALCON-Unzip by assembling reference sequences for three heterozygous samples: *Arabidopsis thaliana*, *Vitis vinifera* cv. Cabernet Sauvignon, and *Clavicorona pyxidata*. The assemblies produced by FALCON were significantly more contiguous and complete than those produced by short or long-read approaches. The phased diploid assemblies enabled the study of haplotype structures and heterozygosities between homologous chromosomes, including identifying widespread heterozygous structural variations within coding sequences. The results show that FALCON and FALCON-Unzip can capture almost all heterozygosity information in the primary contigs and haplotigs, allowing for detailed analysis of haplotype-specific variations and their impact on gene expression, methylation patterns, and regulatory interactions.