26 February 2007 | Shobhit Gupta*, John A Stamatoyanopoulos*, Timothy L Bailey† and William Stafford Noble*‡
The article introduces Tomtom, a software tool for quantifying the similarity between DNA or protein motifs. The authors define a statistical measure of motif-motif similarity and describe an algorithm, Tomtom, that searches a database of motifs with a given query motif. The algorithm estimates the null distribution of scores for each column in the query motif using observed scores from aligning it with columns in a target database. It then combines these scores to compute a motif P value, which is corrected for multiple tests to yield an E value. The accuracy of Tomtom's statistical estimates and its ability to retrieve related motifs are validated through simulations. The results show that Tomtom's P value estimation yields improved rankings compared to ad hoc normalization schemes, and it correctly assigns significant E values to a large percentage of positive matches. The article also compares seven different motif column comparison functions, finding that Euclidean distance performs best. Tomtom is part of the MEME Suite of motif analysis tools and is available for public use.The article introduces Tomtom, a software tool for quantifying the similarity between DNA or protein motifs. The authors define a statistical measure of motif-motif similarity and describe an algorithm, Tomtom, that searches a database of motifs with a given query motif. The algorithm estimates the null distribution of scores for each column in the query motif using observed scores from aligning it with columns in a target database. It then combines these scores to compute a motif P value, which is corrected for multiple tests to yield an E value. The accuracy of Tomtom's statistical estimates and its ability to retrieve related motifs are validated through simulations. The results show that Tomtom's P value estimation yields improved rankings compared to ad hoc normalization schemes, and it correctly assigns significant E values to a large percentage of positive matches. The article also compares seven different motif column comparison functions, finding that Euclidean distance performs best. Tomtom is part of the MEME Suite of motif analysis tools and is available for public use.