Distribution and intensity of constraint in mammalian genomic sequence

Distribution and intensity of constraint in mammalian genomic sequence

2005 | Gregory M. Cooper, Eric A. Stone, George Asimenos, Eric D. Green, Serafim Batzoglou, Arend Sidow
The study presents the results of an analysis of orthologous genomic DNA sequences from 29 mammalian species, revealing regions subject to purifying selection and enriched for functional elements. The analysis covers approximately 1.9 Mbp of the human genome and identifies constrained elements ranging from 3 bp to over 1 kbp in length, covering about 5.5% of the human locus. The total amount of nonexonic constraint is roughly twice that of exonic constraint. Constrained elements tend to cluster and correspond well with known functional elements. Constraint density inversely correlates with mobile element density, but unambiguously constrained elements overlapping mammalian ancestral repeats are also identified. The study introduces GERP, a statistically rigorous and biologically transparent framework for identifying constrained elements. GERP identifies regions with nucleotide substitution deficits, measured as "rejected substitutions," which reflect the intensity of past purifying selection. GERP is effective at identifying constrained elements, including those in exons and nonexonic regions, and provides a reliable measure of constraint. The study also shows that nonexonic elements outnumber exonic elements by a ratio of seven to one, and that the total constraint on nonexonic elements is twice that on exonic elements. Constrained elements tend to cluster, and their density is inversely correlated with repetitive element density. Large constrained regions are identified, with many extending beyond exons. The study also finds that ancestral repeats contain constrained elements, and that some of these elements are under intense purifying selection. Ultraconserved elements, previously identified as regions with no changes over 200 bases, are redefined using GERP, which identifies elements with high RS scores. The study concludes that there are many more ultraconserved elements than previously estimated, and that GERP provides a more accurate method for their identification. The study also shows that some ultraconserved elements have no match in other species, indicating that they are mammalian-specific. Overall, the study highlights the importance of comparative sequence analysis in identifying functional elements in the human genome and provides a framework for further annotation and characterization of constrained elements.The study presents the results of an analysis of orthologous genomic DNA sequences from 29 mammalian species, revealing regions subject to purifying selection and enriched for functional elements. The analysis covers approximately 1.9 Mbp of the human genome and identifies constrained elements ranging from 3 bp to over 1 kbp in length, covering about 5.5% of the human locus. The total amount of nonexonic constraint is roughly twice that of exonic constraint. Constrained elements tend to cluster and correspond well with known functional elements. Constraint density inversely correlates with mobile element density, but unambiguously constrained elements overlapping mammalian ancestral repeats are also identified. The study introduces GERP, a statistically rigorous and biologically transparent framework for identifying constrained elements. GERP identifies regions with nucleotide substitution deficits, measured as "rejected substitutions," which reflect the intensity of past purifying selection. GERP is effective at identifying constrained elements, including those in exons and nonexonic regions, and provides a reliable measure of constraint. The study also shows that nonexonic elements outnumber exonic elements by a ratio of seven to one, and that the total constraint on nonexonic elements is twice that on exonic elements. Constrained elements tend to cluster, and their density is inversely correlated with repetitive element density. Large constrained regions are identified, with many extending beyond exons. The study also finds that ancestral repeats contain constrained elements, and that some of these elements are under intense purifying selection. Ultraconserved elements, previously identified as regions with no changes over 200 bases, are redefined using GERP, which identifies elements with high RS scores. The study concludes that there are many more ultraconserved elements than previously estimated, and that GERP provides a more accurate method for their identification. The study also shows that some ultraconserved elements have no match in other species, indicating that they are mammalian-specific. Overall, the study highlights the importance of comparative sequence analysis in identifying functional elements in the human genome and provides a framework for further annotation and characterization of constrained elements.
Reach us at info@study.space
[slides and audio] Distribution and intensity of constraint in mammalian genomic sequence.