X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Models

X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Models

7 Apr 2024 | Jan Held, Hani Itani, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck
X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Models This paper introduces X-VARS, a multi-modal large language model designed to explain football refereeing decisions. X-VARS is trained on the SoccerNet-XFoul dataset, which contains over 22,000 video-question-answer triplets annotated by over 70 experienced football referees. The model can perform a variety of tasks, including video description, question answering, action recognition, and conducting meaningful conversations based on video content and in accordance with the Laws of the Game for football referees. X-VARS achieves state-of-the-art performance on the SoccerNet-MVFoul dataset and our human study demonstrates that X-VARS generates explanations for its decisions at a level comparable to human referees. X-VARS can analyze and understand complex football duels and provide accurate decision explanations, opening doors for future applications to support referees in their decision-making processes. The contributions of this work include the public release of SoccerNet-XFoul, the introduction of X-VARS, and a thorough evaluation of our model, including analyses of our new training paradigm, the influence of the CLIP text predictions, and a human study that compares X-VARS to human referees.X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Models This paper introduces X-VARS, a multi-modal large language model designed to explain football refereeing decisions. X-VARS is trained on the SoccerNet-XFoul dataset, which contains over 22,000 video-question-answer triplets annotated by over 70 experienced football referees. The model can perform a variety of tasks, including video description, question answering, action recognition, and conducting meaningful conversations based on video content and in accordance with the Laws of the Game for football referees. X-VARS achieves state-of-the-art performance on the SoccerNet-MVFoul dataset and our human study demonstrates that X-VARS generates explanations for its decisions at a level comparable to human referees. X-VARS can analyze and understand complex football duels and provide accurate decision explanations, opening doors for future applications to support referees in their decision-making processes. The contributions of this work include the public release of SoccerNet-XFoul, the introduction of X-VARS, and a thorough evaluation of our model, including analyses of our new training paradigm, the influence of the CLIP text predictions, and a human study that compares X-VARS to human referees.
Reach us at info@study.space