PREGO: online mistake detection in PROcedural EGOcentric videos

PREGO: online mistake detection in PROcedural EGOcentric videos

17 May 2024 | Alessandro Flaborea*, Guido Maria D'Amely di Melendugno*, Leonardo Plini*, Luca Scofano*, Edoardo De Matteis*, Antonino Furnari*, Giovanni Maria Farinella*, Fabio Galasso*
PREGO is an online one-class classification model for detecting procedural errors in egocentric videos. The model combines an online action recognition component to identify current actions and a symbolic reasoning module to predict future actions using a large language model (LLM). Mistakes are detected when the recognized current action differs from the predicted future action. PREGO is evaluated on two datasets, Assembly101 and Epic-tent, adapted for online mistake detection to create new benchmarks, Assembly101-O and Epic-tent-O. The model leverages correctly executed procedures as training data, enabling it to recognize a wide range of procedural mistakes without being confined to predefined errors. The symbolic reasoning module uses contextual analysis to predict future actions, allowing the model to operate in an open-set scenario. PREGO's dual-branch architecture processes video frames in real-time and uses LLMs for symbolic reasoning to anticipate future actions. The model outperforms existing baselines in terms of precision, recall, and F1 score, demonstrating its effectiveness in online procedural mistake detection. The results show that PREGO can detect mistakes in real-time and adapt to different procedural contexts. The model's performance is evaluated using metrics such as precision, recall, and F1 score, and it achieves high accuracy on both datasets. PREGO's approach combines video analysis with symbolic reasoning to detect procedural errors in egocentric videos, making it a valuable tool for applications in manufacturing, healthcare, and other fields where procedural accuracy is critical.PREGO is an online one-class classification model for detecting procedural errors in egocentric videos. The model combines an online action recognition component to identify current actions and a symbolic reasoning module to predict future actions using a large language model (LLM). Mistakes are detected when the recognized current action differs from the predicted future action. PREGO is evaluated on two datasets, Assembly101 and Epic-tent, adapted for online mistake detection to create new benchmarks, Assembly101-O and Epic-tent-O. The model leverages correctly executed procedures as training data, enabling it to recognize a wide range of procedural mistakes without being confined to predefined errors. The symbolic reasoning module uses contextual analysis to predict future actions, allowing the model to operate in an open-set scenario. PREGO's dual-branch architecture processes video frames in real-time and uses LLMs for symbolic reasoning to anticipate future actions. The model outperforms existing baselines in terms of precision, recall, and F1 score, demonstrating its effectiveness in online procedural mistake detection. The results show that PREGO can detect mistakes in real-time and adapt to different procedural contexts. The model's performance is evaluated using metrics such as precision, recall, and F1 score, and it achieves high accuracy on both datasets. PREGO's approach combines video analysis with symbolic reasoning to detect procedural errors in egocentric videos, making it a valuable tool for applications in manufacturing, healthcare, and other fields where procedural accuracy is critical.
Reach us at info@study.space
Understanding PREGO%3A Online Mistake Detection in PRocedural EGOcentric Videos