Evaluation Campaigns and TRECvid

Evaluation Campaigns and TRECvid

October 26-27, 2006, Santa Barbara, California, USA | Alan F. Smeaton, Paul Over, Wessel Kraaij
The TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity that encourages research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle in 2005 and will involve nearly 70 research organizations, universities, and consortia in 2006. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for video shots, automatic detection of semantic and low-level video features, shot boundary detection, and story boundary detection in broadcast TV news. This paper introduces information retrieval (IR) evaluation from both user and system perspectives, highlighting that system evaluation is the most prevalent type of evaluation. It also includes a summary of TRECVid as an example of a system evaluation benchmarking campaign, allowing discussion on whether such campaigns are beneficial or harmful. Arguments for and against these campaigns are presented, concluding that they have had a very positive impact on research progress. Evaluation campaigns, such as TRECVid, are popular in recent years due to their ability to allow researchers to compare their work in an open, metrics-based environment. They provide shared data, common evaluation metrics, and often offer collaboration and sharing of resources. They are also attractive to funding agencies and outsiders as they can act as a showcase for research results. TRECVid involves annual analysis, indexing, and retrieval of video shots, and this paper presents an overview of TRECVid and its activities. It begins with an introduction to evaluation in IR, covering both user and system evaluation. It then presents a catalog of evaluation campaigns in the general area of IR and video analysis. Sections 4 and 5 give a retrospective overview of the TRECVid campaign with attention to the evolution of the evaluation and participating systems, open issues, etc. Section 6 discusses whether evaluation benchmarking campaigns like TRECVid, Text Retrieval Conferences (TREC), and others are good or bad. It presents a series of arguments for each case and leaves the reader to conclude that on balance they have had a positive impact on research progress. Evaluation campaigns have several common features, including being metrics-based with agreed evaluation procedures and data formats, primarily system evaluations rather than user evaluations, open participation, availability of results and some data, manual self-annotation of ground truth or centralized assessment of pooled results, coordination of large volunteer efforts, growing participation, and contributions to raising the profile of their application and evaluation campaigns in general. TRECVid is one such evaluation campaign, and the paper looks at one specific benchmarking evaluation campaign, TRECVid. TRECVid began on a small scale in 2001 as one of the many variations on standard text IR evaluations. The motivation was an interest at NIST in expanding the notion of "information" in IR beyond text and the observation that it was difficult to compare research results in video retrieval becauseThe TREC Video Retrieval Evaluation (TRECVid) is an international benchmarking activity that encourages research in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle in 2005 and will involve nearly 70 research organizations, universities, and consortia in 2006. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for video shots, automatic detection of semantic and low-level video features, shot boundary detection, and story boundary detection in broadcast TV news. This paper introduces information retrieval (IR) evaluation from both user and system perspectives, highlighting that system evaluation is the most prevalent type of evaluation. It also includes a summary of TRECVid as an example of a system evaluation benchmarking campaign, allowing discussion on whether such campaigns are beneficial or harmful. Arguments for and against these campaigns are presented, concluding that they have had a very positive impact on research progress. Evaluation campaigns, such as TRECVid, are popular in recent years due to their ability to allow researchers to compare their work in an open, metrics-based environment. They provide shared data, common evaluation metrics, and often offer collaboration and sharing of resources. They are also attractive to funding agencies and outsiders as they can act as a showcase for research results. TRECVid involves annual analysis, indexing, and retrieval of video shots, and this paper presents an overview of TRECVid and its activities. It begins with an introduction to evaluation in IR, covering both user and system evaluation. It then presents a catalog of evaluation campaigns in the general area of IR and video analysis. Sections 4 and 5 give a retrospective overview of the TRECVid campaign with attention to the evolution of the evaluation and participating systems, open issues, etc. Section 6 discusses whether evaluation benchmarking campaigns like TRECVid, Text Retrieval Conferences (TREC), and others are good or bad. It presents a series of arguments for each case and leaves the reader to conclude that on balance they have had a positive impact on research progress. Evaluation campaigns have several common features, including being metrics-based with agreed evaluation procedures and data formats, primarily system evaluations rather than user evaluations, open participation, availability of results and some data, manual self-annotation of ground truth or centralized assessment of pooled results, coordination of large volunteer efforts, growing participation, and contributions to raising the profile of their application and evaluation campaigns in general. TRECVid is one such evaluation campaign, and the paper looks at one specific benchmarking evaluation campaign, TRECVid. TRECVid began on a small scale in 2001 as one of the many variations on standard text IR evaluations. The motivation was an interest at NIST in expanding the notion of "information" in IR beyond text and the observation that it was difficult to compare research results in video retrieval because
Reach us at info@study.space
[slides and audio] Evaluation campaigns and TRECVid