26 August 2012 | Kunbo Wang, Zhiwen Wang, Fuguang Li, Wuwei Ye, Junyi Wang, Guoli Song, Zhen Yue, Lin Cong, Haihong Shang, Shilin Zhu, Changsong Zou, Qin Li, Youlu Yuan, Cairui Lu, Hengling Wei, Caiyun Gou, Zequn Zheng, Ye Yin, Xueyan Zhang, Kun Liu, Bo Wang, Chi Song, Nan Shi, Russell J Kohel, Richard G Yu, Yu-Xian Zhu, Jun Wang & Shuxun Yu
A draft genome sequence of Gossypium raimondii, a diploid cotton species, has been sequenced and assembled. This genome is believed to be the progenitor of the D subgenome in economically important fiber-producing cotton species Gossypium hirsutum and Gossypium barbadense. Over 73% of the assembled sequences were anchored to 13 G. raimondii chromosomes. The genome contains 40,976 protein-coding genes, with 92.2% confirmed by transcriptome data. Evidence of a hexaploidization event and a cotton-specific whole-genome duplication approximately 13–20 million years ago was observed. The genome contains 2,355 syntenic blocks, suggesting substantial chromosome rearrangement during evolution. Cotton and Theobroma cacao are the only sequenced plant species with an authentic CDN1 gene family for gossypol biosynthesis.
Cotton is a major economic crop, with fiber being a key component of the textile industry. The Gossypium genus includes 5 tetraploid and over 45 diploid species, believed to have originated from a common ancestor 5–10 million years ago. The diploid cotton species share a common chromosome number and high levels of synteny. Tetraploid cotton species, such as G. hirsutum and G. barbadense, are thought to have formed through an allopolyploidization event involving a D-genome species and an A-genome species. The draft genome of G. raimondii was sequenced using a next-generation Illumina strategy, covering 103.6-fold of the genome. The assembly covered 88.1% of the estimated genome size, with 73.2% of the assembled sequences anchored to chromosomes.
The genome contains 40,976 protein-coding genes, with 92.2% supported by transcriptome data. Comparative analysis with other plant species showed similar numbers of gene families, with a core set of 9,525 in common. Phylogenetic analysis revealed a common subclade between G. raimondii and T. cacao, diverging 33.7 million years ago. The genome also showed evidence of a paleohexaploidization event 115.4–146.1 million years ago. Transposable elements make up 57% of the genome, with LTRs being the most common. The genome also contains a high proportion of genes near transposable elements. SSRs were identified in the genome, providing markers for cotton breeding. Analysis of fiber development genes showed differences between G. raimondii and G. hirsutum, indicating the importance of Sus, KCS, and ACO genes for fiber development. The genome also contains a unique CDN1A draft genome sequence of Gossypium raimondii, a diploid cotton species, has been sequenced and assembled. This genome is believed to be the progenitor of the D subgenome in economically important fiber-producing cotton species Gossypium hirsutum and Gossypium barbadense. Over 73% of the assembled sequences were anchored to 13 G. raimondii chromosomes. The genome contains 40,976 protein-coding genes, with 92.2% confirmed by transcriptome data. Evidence of a hexaploidization event and a cotton-specific whole-genome duplication approximately 13–20 million years ago was observed. The genome contains 2,355 syntenic blocks, suggesting substantial chromosome rearrangement during evolution. Cotton and Theobroma cacao are the only sequenced plant species with an authentic CDN1 gene family for gossypol biosynthesis.
Cotton is a major economic crop, with fiber being a key component of the textile industry. The Gossypium genus includes 5 tetraploid and over 45 diploid species, believed to have originated from a common ancestor 5–10 million years ago. The diploid cotton species share a common chromosome number and high levels of synteny. Tetraploid cotton species, such as G. hirsutum and G. barbadense, are thought to have formed through an allopolyploidization event involving a D-genome species and an A-genome species. The draft genome of G. raimondii was sequenced using a next-generation Illumina strategy, covering 103.6-fold of the genome. The assembly covered 88.1% of the estimated genome size, with 73.2% of the assembled sequences anchored to chromosomes.
The genome contains 40,976 protein-coding genes, with 92.2% supported by transcriptome data. Comparative analysis with other plant species showed similar numbers of gene families, with a core set of 9,525 in common. Phylogenetic analysis revealed a common subclade between G. raimondii and T. cacao, diverging 33.7 million years ago. The genome also showed evidence of a paleohexaploidization event 115.4–146.1 million years ago. Transposable elements make up 57% of the genome, with LTRs being the most common. The genome also contains a high proportion of genes near transposable elements. SSRs were identified in the genome, providing markers for cotton breeding. Analysis of fiber development genes showed differences between G. raimondii and G. hirsutum, indicating the importance of Sus, KCS, and ACO genes for fiber development. The genome also contains a unique CDN1