JANUARY 2005 | Martin Tompa1,2, Nan Li1, Timothy L Bailey3, George M Church4, Bart De Moor5, Eleazar Eskrin6, Alexander V Favorov7,8, Martin C Frith9, Yutao Fu9, W James Kent10, Vsevolod J Makeev7,8, Andrei A Mironov7,11, William Stafford Noble1,2, Giulio Pavesi12, Graziano Pesole13, Mireille Régini14, Nicolas Simonis15, Saurabh Sinha16, Gert Thijs5, Jacques van Helden15, Mathias Vandenbogaert14, Zhiping Weng9, Christopher Workman17, Chun Ye18 & Zhou Zhu4
This study evaluates 13 computational tools for the discovery of transcription factor binding sites (TFBS). The goal is to assess the accuracy of these tools in identifying regulatory elements and to provide a benchmark dataset for future evaluations. The study uses real and synthetic data sets containing known TFBS to test the tools. Each tool was run by experts on these data sets, and the results were compared to the known binding sites using various statistical measures.
The study found that the accuracy of the tools is generally low, with site sensitivity (sSn) at most 0.22 and correlation coefficient (nCC) at most 0.20. This indicates that current computational methods for predicting TFBS are not highly accurate. However, the study also shows that some tools, such as Weeder, perform better than others. The results suggest that computational biologists have been more successful at modeling TFBS in yeast than in metazoans.
The study highlights the challenges in evaluating these tools, including the difficulty of defining a true standard for correctness and the variability in tool performance across different data sets. The study also suggests that future assessments should focus on improving the evaluation methods and include more diverse data sets. The assessment web site provides the data sets and tools for further analysis. The study concludes that while current tools are not perfect, they offer valuable insights into the mechanisms of gene regulation and that further research is needed to improve their accuracy.This study evaluates 13 computational tools for the discovery of transcription factor binding sites (TFBS). The goal is to assess the accuracy of these tools in identifying regulatory elements and to provide a benchmark dataset for future evaluations. The study uses real and synthetic data sets containing known TFBS to test the tools. Each tool was run by experts on these data sets, and the results were compared to the known binding sites using various statistical measures.
The study found that the accuracy of the tools is generally low, with site sensitivity (sSn) at most 0.22 and correlation coefficient (nCC) at most 0.20. This indicates that current computational methods for predicting TFBS are not highly accurate. However, the study also shows that some tools, such as Weeder, perform better than others. The results suggest that computational biologists have been more successful at modeling TFBS in yeast than in metazoans.
The study highlights the challenges in evaluating these tools, including the difficulty of defining a true standard for correctness and the variability in tool performance across different data sets. The study also suggests that future assessments should focus on improving the evaluation methods and include more diverse data sets. The assessment web site provides the data sets and tools for further analysis. The study concludes that while current tools are not perfect, they offer valuable insights into the mechanisms of gene regulation and that further research is needed to improve their accuracy.