[slides and audio] IR evaluation methods for retrieving highly relevant documents

This paper proposes evaluation methods for Information Retrieval (IR) systems that consider non-dichotomous relevance judgments, focusing on the ability to retrieve highly relevant documents. The authors argue that traditional binary relevance assessments do not adequately reflect the varying degrees of relevance and user preferences in modern large IR environments. They introduce two novel evaluation measures: (1) a new application of P-R curves and average precision computations based on separate recall bases for documents of different relevance levels, and (2) cumulative gain (CG) and discounted cumulative gain (DCG) measures that estimate the cumulative relevance gain up to a given ranked position. These methods are demonstrated through a case study on the effectiveness of query types, combining query structures and expansion, in retrieving documents of various relevance levels. The results show that strong query structures are most effective in retrieving highly relevant documents, with statistically significant differences between query types. The study highlights the importance of non-dichotomous relevance assessments in IR experiments, revealing interesting phenomena and allowing for more rigorous testing of IR methods.This paper proposes evaluation methods for Information Retrieval (IR) systems that consider non-dichotomous relevance judgments, focusing on the ability to retrieve highly relevant documents. The authors argue that traditional binary relevance assessments do not adequately reflect the varying degrees of relevance and user preferences in modern large IR environments. They introduce two novel evaluation measures: (1) a new application of P-R curves and average precision computations based on separate recall bases for documents of different relevance levels, and (2) cumulative gain (CG) and discounted cumulative gain (DCG) measures that estimate the cumulative relevance gain up to a given ranked position. These methods are demonstrated through a case study on the effectiveness of query types, combining query structures and expansion, in retrieving documents of various relevance levels. The results show that strong query structures are most effective in retrieving highly relevant documents, with statistically significant differences between query types. The study highlights the importance of non-dichotomous relevance assessments in IR experiments, revealing interesting phenomena and allowing for more rigorous testing of IR methods.

IR evaluation methods for retrieving highly relevant documents

2000 | Kalervo Järvelin & Jaana Kekäläinen