7 Apr 2024 | Jan Held, Hani Itani, Anthony Cioppa, Silvio Giancola, Bernard Ghanem, Marc Van Droogenbroeck
The paper introduces X-VARS, a multi-modal large language model designed to enhance transparency and explainability in football refereeing. X-VARS is trained on a novel dataset, SoccerNet-XFoul, which consists of over 22,000 video-question-answer triplets annotated by experienced football referees. The model can perform tasks such as video description, question answering, action recognition, and generating explanations based on video content and the Laws of the Game. Experiments and a human study demonstrate that X-VARS achieves state-of-the-art performance in interpreting complex football clips and generating explanations comparable to human referees. The paper also discusses the training methodology, including a two-stage approach to fine-tuning CLIP and the LLM, and presents qualitative results and an ablation study to validate the model's capabilities. Overall, X-VARS shows promise in supporting football referees and improving the transparency of automated decision-making processes.The paper introduces X-VARS, a multi-modal large language model designed to enhance transparency and explainability in football refereeing. X-VARS is trained on a novel dataset, SoccerNet-XFoul, which consists of over 22,000 video-question-answer triplets annotated by experienced football referees. The model can perform tasks such as video description, question answering, action recognition, and generating explanations based on video content and the Laws of the Game. Experiments and a human study demonstrate that X-VARS achieves state-of-the-art performance in interpreting complex football clips and generating explanations comparable to human referees. The paper also discusses the training methodology, including a two-stage approach to fine-tuning CLIP and the LLM, and presents qualitative results and an ablation study to validate the model's capabilities. Overall, X-VARS shows promise in supporting football referees and improving the transparency of automated decision-making processes.