[slides and audio] Who Determines What Is Relevant%3F Humans or AI%3F Why Not Both%3F

The article discusses the role of human-AI collaboration in assessing relevance, particularly in the context of web search, question answering, and knowledge base retrieval. Traditionally, relevance judgments are made by human assessors, but recent advancements in large language models (LLMs) like ChatGPT have led to experiments with AI-driven relevance assessments. While empirical studies show that LLM-generated opinions often align with human judgments, the authors argue that fully automated approaches have several issues, including bias, resilience against misinformation, and the risk of concept drift. The authors propose a spectrum of human-AI collaboration, ranging from complete human judgment to complete AI replacement. They suggest that a hybrid approach, where humans and AI work together, is more effective. This includes scenarios where AI assists humans in making judgments, verifies automated judgments, or provides feedback to improve AI performance. The article emphasizes the need for further research to balance human and AI capabilities, ensuring that AI amplifies human intelligence rather than replacing it, and to address challenges such as explainability and the detection of superhuman performance by AI. In conclusion, the authors advocate for a collaborative approach that leverages the strengths of both humans and AI to enhance the efficiency, effectiveness, and fairness of decision-making processes in relevance assessment.The article discusses the role of human-AI collaboration in assessing relevance, particularly in the context of web search, question answering, and knowledge base retrieval. Traditionally, relevance judgments are made by human assessors, but recent advancements in large language models (LLMs) like ChatGPT have led to experiments with AI-driven relevance assessments. While empirical studies show that LLM-generated opinions often align with human judgments, the authors argue that fully automated approaches have several issues, including bias, resilience against misinformation, and the risk of concept drift. The authors propose a spectrum of human-AI collaboration, ranging from complete human judgment to complete AI replacement. They suggest that a hybrid approach, where humans and AI work together, is more effective. This includes scenarios where AI assists humans in making judgments, verifies automated judgments, or provides feedback to improve AI performance. The article emphasizes the need for further research to balance human and AI capabilities, ensuring that AI amplifies human intelligence rather than replacing it, and to address challenges such as explainability and the detection of superhuman performance by AI. In conclusion, the authors advocate for a collaborative approach that leverages the strengths of both humans and AI to enhance the efficiency, effectiveness, and fairness of decision-making processes in relevance assessment.

Who Determines What Is Relevant? Humans or AI? Why Not Both? A spectrum of human-artificial intelligence collaboration in assessing relevance.

APRIL 2024 | Gianluca Demartini et al.