Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data

Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data

27 April 2006 | Bertrand R Huber and Martha L Bulyk
This study presents a meta-analysis of tissue-specific DNA sequence motifs from mammalian gene expression data. The researchers developed a software package called MultiFinder, which combines results from four different motif finding algorithms to identify and rank known and novel regulatory DNA motifs. The approach was tested on conserved noncoding regions surrounding co-expressed tissue-specific human genes, allowing the discovery of both previously known and many novel candidate regulatory DNA motifs in all 18 tissue-specific expression clusters examined. The study found that integrating results from multiple motif finding tools significantly improves the identification and ranking of known and novel motifs compared to using a single tool. The researchers also applied a filter to eliminate motifs that may score well by common metrics but do not resemble typical TFBS motifs. This filtering strategy helped identify likely human cis regulatory elements. The study also examined the effect of input sequence length on motif discovery and found that increasing the length of the input sequence can lead to less significant group specificity scores and lower motif rankings. However, biologically significant TFBS motifs can still be found within 5 kb upstream regions, even if they do not necessarily rank highly according to group specificity scores. The study identified a large number of novel, candidate TFBS motifs, including 431 previously known and 579 novel, nonredundant motifs with group specificity scores better than the geometric mean of their corresponding matched randoms. These motifs may correspond to novel binding site motifs for as yet uncharacterized tissue-specific TFs. The study also demonstrated that tissue-specific expression data can be used to identify candidate regulatory motifs for those tissues, and that high-throughput genomic technologies may help establish what TFs bind to these candidate regulatory motifs. The results suggest that this strategy could be useful for identifying motifs in other metazoan genomes.This study presents a meta-analysis of tissue-specific DNA sequence motifs from mammalian gene expression data. The researchers developed a software package called MultiFinder, which combines results from four different motif finding algorithms to identify and rank known and novel regulatory DNA motifs. The approach was tested on conserved noncoding regions surrounding co-expressed tissue-specific human genes, allowing the discovery of both previously known and many novel candidate regulatory DNA motifs in all 18 tissue-specific expression clusters examined. The study found that integrating results from multiple motif finding tools significantly improves the identification and ranking of known and novel motifs compared to using a single tool. The researchers also applied a filter to eliminate motifs that may score well by common metrics but do not resemble typical TFBS motifs. This filtering strategy helped identify likely human cis regulatory elements. The study also examined the effect of input sequence length on motif discovery and found that increasing the length of the input sequence can lead to less significant group specificity scores and lower motif rankings. However, biologically significant TFBS motifs can still be found within 5 kb upstream regions, even if they do not necessarily rank highly according to group specificity scores. The study identified a large number of novel, candidate TFBS motifs, including 431 previously known and 579 novel, nonredundant motifs with group specificity scores better than the geometric mean of their corresponding matched randoms. These motifs may correspond to novel binding site motifs for as yet uncharacterized tissue-specific TFs. The study also demonstrated that tissue-specific expression data can be used to identify candidate regulatory motifs for those tissues, and that high-throughput genomic technologies may help establish what TFs bind to these candidate regulatory motifs. The results suggest that this strategy could be useful for identifying motifs in other metazoan genomes.
Reach us at info@study.space
[slides] Maturation of a central | StudySpace