January 2005 | Martin Tompa, Nan Li, Timothy L Bailey, George M Church, Bart De Moor, Eleazar Eskin, Alexander V Favorov, Martin C Frith, Yutao Fu, W James Kent, Vsevolod J Makeev, Andrei A Mironov, William Stafford Noble, Giulio Pavesi, Graziano Pesole, Mireille Régnier, Nicolas Simonis, Saurabh Sinha, Gert Thijs, Jacques van Helden, Mathias Vandenbogaert, Zhiping Weng, Christopher Workman, Chun Ye, Zhou Zhu
This paper assesses 13 computational tools for predicting transcription factor binding sites, a critical task in understanding gene regulation. The study aims to provide guidance on tool accuracy and a benchmark dataset for future evaluations. The tools were tested on datasets containing known binding sites, with experts predicting motifs and comparing them to the actual sites. The results show that while the tools have low absolute correctness measures, they perform better on generic and Markov datasets compared to real promoter sequences. Weeder outperformed other tools, and complementary behaviors among certain pairs of tools were observed, suggesting potential improvements through combination. The authors suggest improvements for future assessments, including eliminating real promoter datasets and requiring tools to predict multiple motifs per dataset. Despite the challenges, the study highlights the complexity of regulatory element prediction and the need for further research.This paper assesses 13 computational tools for predicting transcription factor binding sites, a critical task in understanding gene regulation. The study aims to provide guidance on tool accuracy and a benchmark dataset for future evaluations. The tools were tested on datasets containing known binding sites, with experts predicting motifs and comparing them to the actual sites. The results show that while the tools have low absolute correctness measures, they perform better on generic and Markov datasets compared to real promoter sequences. Weeder outperformed other tools, and complementary behaviors among certain pairs of tools were observed, suggesting potential improvements through combination. The authors suggest improvements for future assessments, including eliminating real promoter datasets and requiring tools to predict multiple motifs per dataset. Despite the challenges, the study highlights the complexity of regulatory element prediction and the need for further research.