Advance Access publication May 4, 2011 | Timothy L. Bailey
The paper introduces DREME, a motif discovery algorithm designed to identify short, core DNA-binding motifs of eukaryotic transcription factors (TFs) from large ChIP-seq datasets. DREME is optimized for speed and efficiency, scaling linearly with dataset size and finding multiple, non-redundant motifs with reliable statistical significance. The algorithm uses a simplified form of regular expressions (consensus sequences with wildcards) to search for motifs, focusing on short motifs (4-8 bp) that are ideal for monomeric TFs. DREME employs a beam search approach to identify statistically significant motifs, using Fisher's Exact Test to measure relative enrichment. The algorithm is evaluated on mouse embryonic stem cell (mESC), mouse erythrocyte, and human lymphoblastoid cell line ChIP-seq datasets, demonstrating its ability to discover primary and cofactor motifs, and to perform discriminative motif discovery. DREME outperforms other popular motif discovery algorithms in terms of speed and the number of motifs discovered, particularly for identifying cofactor motifs. The paper also discusses the limitations of existing algorithms and highlights the advantages of DREME in analyzing ChIP-seq data.The paper introduces DREME, a motif discovery algorithm designed to identify short, core DNA-binding motifs of eukaryotic transcription factors (TFs) from large ChIP-seq datasets. DREME is optimized for speed and efficiency, scaling linearly with dataset size and finding multiple, non-redundant motifs with reliable statistical significance. The algorithm uses a simplified form of regular expressions (consensus sequences with wildcards) to search for motifs, focusing on short motifs (4-8 bp) that are ideal for monomeric TFs. DREME employs a beam search approach to identify statistically significant motifs, using Fisher's Exact Test to measure relative enrichment. The algorithm is evaluated on mouse embryonic stem cell (mESC), mouse erythrocyte, and human lymphoblastoid cell line ChIP-seq datasets, demonstrating its ability to discover primary and cofactor motifs, and to perform discriminative motif discovery. DREME outperforms other popular motif discovery algorithms in terms of speed and the number of motifs discovered, particularly for identifying cofactor motifs. The paper also discusses the limitations of existing algorithms and highlights the advantages of DREME in analyzing ChIP-seq data.