Quantifying similarity between motifs

Quantifying similarity between motifs

26 February 2007 | Shobhit Gupta, John A Stamatoyannopoulos, Timothy L Bailey and William Stafford Noble
This paper introduces Tomtom, a software tool for comparing DNA or protein motifs and determining their similarity. The tool calculates a statistical measure of motif similarity and uses an algorithm to search a database of motifs for matches to a given query motif. The method involves computing P values and E values to assess the statistical significance of motif matches. Tomtom considers all possible relative offsets and orientations between motifs, and for DNA motifs, it also considers reverse complement matches. The algorithm uses dynamic programming to estimate the null distribution of scores from an arbitrary column comparison function. It then computes P values for each possible offset and uses the minimum P value to determine the overall P value of the match between the query and target motifs. The tool also applies a Bonferroni correction to derive an E value, which represents the expected number of matches in a randomized database. The paper validates the accuracy of Tomtom's statistical estimates through simulations and shows that it outperforms other methods in retrieving related motifs. Tomtom is implemented as part of the MEME Suite of motif analysis tools and is publicly available. The tool supports seven different column comparison functions, including Pearson correlation coefficient, average log-likelihood ratio, Fisher-Irwin exact test, Kullback-Leibler divergence, Euclidean distance, and Sandelin-Wasserman function. The results show that Tomtom's P value estimation provides improved rankings compared to ad hoc normalization schemes. The tool is particularly useful for identifying related motifs in a database and has been validated using simulated data. The paper also discusses the practical applicability of Tomtom in conjunction with MEME, an ab initio motif discovery tool.This paper introduces Tomtom, a software tool for comparing DNA or protein motifs and determining their similarity. The tool calculates a statistical measure of motif similarity and uses an algorithm to search a database of motifs for matches to a given query motif. The method involves computing P values and E values to assess the statistical significance of motif matches. Tomtom considers all possible relative offsets and orientations between motifs, and for DNA motifs, it also considers reverse complement matches. The algorithm uses dynamic programming to estimate the null distribution of scores from an arbitrary column comparison function. It then computes P values for each possible offset and uses the minimum P value to determine the overall P value of the match between the query and target motifs. The tool also applies a Bonferroni correction to derive an E value, which represents the expected number of matches in a randomized database. The paper validates the accuracy of Tomtom's statistical estimates through simulations and shows that it outperforms other methods in retrieving related motifs. Tomtom is implemented as part of the MEME Suite of motif analysis tools and is publicly available. The tool supports seven different column comparison functions, including Pearson correlation coefficient, average log-likelihood ratio, Fisher-Irwin exact test, Kullback-Leibler divergence, Euclidean distance, and Sandelin-Wasserman function. The results show that Tomtom's P value estimation provides improved rankings compared to ad hoc normalization schemes. The tool is particularly useful for identifying related motifs in a database and has been validated using simulated data. The paper also discusses the practical applicability of Tomtom in conjunction with MEME, an ab initio motif discovery tool.
Reach us at info@study.space