December 1, 2011 | A. P. Jason de Koning, Wanjun Gu, Todd A. Castoe, Mark A. Batzer, David D. Pollock
A study reveals that repetitive elements may constitute over two-thirds of the human genome. Using a novel de novo method called P-clouds, researchers identified additional repetitive sequences, suggesting that 66–69% of the human genome is repetitive or repeat-derived. This contrasts with previous estimates using conventional methods like RepeatMasker (RM), which have lower sensitivity for detecting short fragments. P-clouds, which search for clusters of high-abundance oligonucleotides, detected more repetitive sequences, including many previously unannotated elements. Element-specific P-clouds (ESPs) were developed to identify novel Alu and MIR SINE elements, revealing approximately 100 Mb of previously unannotated sequences. These results indicate that RM may have missed a significant portion of repetitive sequences, highlighting the need for combined, probabilistic genome annotation approaches. The study also shows that P-clouds can detect short fragments more effectively than RM, with good sensitivity down to 25 bp. The findings suggest that the human genome contains substantially more repetitive sequence than previously believed, with a large majority of the genome being repetitive or repeat-derived. The study emphasizes the importance of improving methods for detecting repetitive elements, as they play a significant role in shaping the human genome.A study reveals that repetitive elements may constitute over two-thirds of the human genome. Using a novel de novo method called P-clouds, researchers identified additional repetitive sequences, suggesting that 66–69% of the human genome is repetitive or repeat-derived. This contrasts with previous estimates using conventional methods like RepeatMasker (RM), which have lower sensitivity for detecting short fragments. P-clouds, which search for clusters of high-abundance oligonucleotides, detected more repetitive sequences, including many previously unannotated elements. Element-specific P-clouds (ESPs) were developed to identify novel Alu and MIR SINE elements, revealing approximately 100 Mb of previously unannotated sequences. These results indicate that RM may have missed a significant portion of repetitive sequences, highlighting the need for combined, probabilistic genome annotation approaches. The study also shows that P-clouds can detect short fragments more effectively than RM, with good sensitivity down to 25 bp. The findings suggest that the human genome contains substantially more repetitive sequence than previously believed, with a large majority of the genome being repetitive or repeat-derived. The study emphasizes the importance of improving methods for detecting repetitive elements, as they play a significant role in shaping the human genome.