Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

December 1, 2011 | A. P. Jason de Koning, Wanjun Gu, Todd A. Castoe, Mark A. Batzer, David D. Pollock
This study challenges the widely accepted view that only about 45-50% of the human genome is repetitive or repeat-derived, primarily from transposable elements (TEs). Using a novel *de novo* strategy called *P-clouds*, which identifies clusters of high-abundance oligonucleotides related in sequence space, the authors predict that over 840 million base pairs (Mbp) of additional repetitive sequences exist in the human genome. This suggests that the human genome may be composed of 66-69% repetitive or repeat-derived sequences. The study compares the performance of *P-clouds* with *RepeatMasker* (RM), a conventional approach, in detecting different-sized fragments of two well-known human SINE families, Alu and MIR. *P-clouds* demonstrates significantly higher sensitivity, particularly for shorter fragments, compared to RM. The authors also introduce "element-specific" *P-clouds* (ESPs) to identify novel Alu and MIR elements, identifying approximately 100 Mb of previously unannotated human elements. These findings highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome contains a much larger proportion of repetitive sequences than previously thought.This study challenges the widely accepted view that only about 45-50% of the human genome is repetitive or repeat-derived, primarily from transposable elements (TEs). Using a novel *de novo* strategy called *P-clouds*, which identifies clusters of high-abundance oligonucleotides related in sequence space, the authors predict that over 840 million base pairs (Mbp) of additional repetitive sequences exist in the human genome. This suggests that the human genome may be composed of 66-69% repetitive or repeat-derived sequences. The study compares the performance of *P-clouds* with *RepeatMasker* (RM), a conventional approach, in detecting different-sized fragments of two well-known human SINE families, Alu and MIR. *P-clouds* demonstrates significantly higher sensitivity, particularly for shorter fragments, compared to RM. The authors also introduce "element-specific" *P-clouds* (ESPs) to identify novel Alu and MIR elements, identifying approximately 100 Mb of previously unannotated human elements. These findings highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome contains a much larger proportion of repetitive sequences than previously thought.
Reach us at info@study.space