JANUARY 2005 | Martin Tompa1,2, Nan Li1, Timothy L Bailey3, George M Church4, Bart De Moor5, Eleazar Eskrin6, Alexander V Favorov7,8, Martin C Frith9, Yutao Fu9, W James Kent10, Vsevolod J Makeev7,8, Andrei A Mironov7,11, William Stafford Noble1,2, Giulio Pavesi12, Graziano Pesole13, Mireille Régini14, Nicolas Simonis15, Saurabh Sinha16, Gert Thijs5, Jacques van Helden15, Mathias Vandenbogaert14, Zhiping Weng9, Christopher Workman17, Chun Ye18 & Zhou Zhu4
This paper assesses 13 computational tools for predicting transcription factor binding sites, a critical task in understanding gene regulation. The study aims to provide guidance on tool accuracy and a benchmark dataset for future evaluations. The tools were tested on datasets containing known binding sites, with experts predicting motifs and comparing them to the actual sites. The results show that while the tools have low absolute correctness measures, they perform better on generic and Markov datasets compared to real promoter sequences. Weeder outperformed other tools, and complementary behaviors among certain pairs of tools were observed, suggesting potential improvements through combination. The authors suggest improvements for future assessments, including eliminating real promoter datasets and requiring tools to predict multiple motifs per dataset. Despite the challenges, the study highlights the complexity of regulatory element prediction and the need for further research.This paper assesses 13 computational tools for predicting transcription factor binding sites, a critical task in understanding gene regulation. The study aims to provide guidance on tool accuracy and a benchmark dataset for future evaluations. The tools were tested on datasets containing known binding sites, with experts predicting motifs and comparing them to the actual sites. The results show that while the tools have low absolute correctness measures, they perform better on generic and Markov datasets compared to real promoter sequences. Weeder outperformed other tools, and complementary behaviors among certain pairs of tools were observed, suggesting potential improvements through combination. The authors suggest improvements for future assessments, including eliminating real promoter datasets and requiring tools to predict multiple motifs per dataset. Despite the challenges, the study highlights the complexity of regulatory element prediction and the need for further research.