IR evaluation methods for retrieving highly relevant documents

IR evaluation methods for retrieving highly relevant documents

2000 | Kalervo Järvelin & Jaana Kekäläinen
This paper proposes evaluation methods for information retrieval (IR) that use non-dichotomous relevance judgments to better assess IR methods' ability to retrieve highly relevant documents. Traditional methods often use binary relevance assessments, which may not reflect the varying degrees of relevance. The authors argue that IR methods should be evaluated based on their effectiveness in retrieving highly relevant documents, which is important for user satisfaction in large IR environments. The paper introduces two novel evaluation methods: (1) a new application of P-R curves and average precision computations using separate recall bases for documents of different relevance levels, and (2) two new measures that compute the cumulative gain a user obtains by examining retrieval results up to a given rank. These measures take into account both the relevance of documents and their rank position, providing a more accurate assessment of IR performance. The authors demonstrate these methods in a case study on the effectiveness of query types, focusing on how query structures and expansions affect retrieval of documents of varying relevance. The study uses a best match retrieval system (InQuery) on a database of newspaper articles. The results show that strong query structures are most effective in retrieving highly relevant documents, and the differences between query types are statistically significant. The study also compares the performance of different query types using cumulative gain (CG) and discounted cumulative gain (DCG) measures. These measures account for the decreasing value of documents as they appear further down the retrieval list. The results indicate that strong query structures outperform others in retrieving highly relevant documents, and the differences are significant. The paper concludes that non-dichotomous relevance assessments are applicable in IR experiments and can reveal important differences between retrieval methods. The proposed evaluation methods provide a more accurate and comprehensive assessment of IR performance, particularly in large environments where highly relevant documents are crucial. The study highlights the importance of considering document relevance and rank in evaluating IR systems.This paper proposes evaluation methods for information retrieval (IR) that use non-dichotomous relevance judgments to better assess IR methods' ability to retrieve highly relevant documents. Traditional methods often use binary relevance assessments, which may not reflect the varying degrees of relevance. The authors argue that IR methods should be evaluated based on their effectiveness in retrieving highly relevant documents, which is important for user satisfaction in large IR environments. The paper introduces two novel evaluation methods: (1) a new application of P-R curves and average precision computations using separate recall bases for documents of different relevance levels, and (2) two new measures that compute the cumulative gain a user obtains by examining retrieval results up to a given rank. These measures take into account both the relevance of documents and their rank position, providing a more accurate assessment of IR performance. The authors demonstrate these methods in a case study on the effectiveness of query types, focusing on how query structures and expansions affect retrieval of documents of varying relevance. The study uses a best match retrieval system (InQuery) on a database of newspaper articles. The results show that strong query structures are most effective in retrieving highly relevant documents, and the differences between query types are statistically significant. The study also compares the performance of different query types using cumulative gain (CG) and discounted cumulative gain (DCG) measures. These measures account for the decreasing value of documents as they appear further down the retrieval list. The results indicate that strong query structures outperform others in retrieving highly relevant documents, and the differences are significant. The paper concludes that non-dichotomous relevance assessments are applicable in IR experiments and can reveal important differences between retrieval methods. The proposed evaluation methods provide a more accurate and comprehensive assessment of IR performance, particularly in large environments where highly relevant documents are crucial. The study highlights the importance of considering document relevance and rank in evaluating IR systems.
Reach us at info@study.space