17 May 2024 | Alessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scafano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio Galasso
PREGO is an online one-class classification model for detecting procedural errors in egocentric videos. It combines an online action recognition component to model the current action and a symbolic reasoning module to predict the next actions. Mistake detection is performed by comparing the recognized current action with the expected future one. PREGO is evaluated on two procedural egocentric video datasets, Assembly101 and Epic-tent, which are adapted for online benchmarking of procedural mistake detection, defining the Assembly101-O and Epic-tent-O datasets. The code is available at https://github.com/aleflabo/PREGO.
PREGO's architecture is dual-branched, with the first branch analyzing frames in a procedural video up to a current time t to classify the action being undertaken by the operator. The second branch predicts the action at time t based solely on the steps up to t-1. A pre-trained Large Language Model (LLM) is used for zero-shot symbolic reasoning through contextual analysis. An error is detected upon a misalignment between the currently recognized action and the anticipated one.
PREGO achieves open-setness by relying exclusively on correct procedural sequences during training, following the One-Class Classification (OCC) paradigm. This allows PREGO to identify a wide range of procedural mistakes without being confined to a predefined set of errors. PREGO's approach is an abstraction from the video content, using labels for longer-term reasoning and as an alternative to carefully constructed action inter-dependency graphs.
To evaluate PREGO, the authors introduce the novel task of online procedural mistake detection and rearrange existing datasets to provide two new benchmarks, referred to as Assembly101-O and Epic-tent-O. The evaluation metrics include precision, recall, and F1 score. PREGO outperforms several baselines, including One-step memory, BERT, and OadTR, achieving a significant improvement in F1-score. PREGO's symbolic reasoning allows it to operate at a higher level of abstraction than video-based methods, mitigating challenges with occlusion and forecasting fine-grained actions. The results show that PREGO can better learn the normal patterns of the procedures and detect deviations from them, achieving the best results in terms of F1-score.PREGO is an online one-class classification model for detecting procedural errors in egocentric videos. It combines an online action recognition component to model the current action and a symbolic reasoning module to predict the next actions. Mistake detection is performed by comparing the recognized current action with the expected future one. PREGO is evaluated on two procedural egocentric video datasets, Assembly101 and Epic-tent, which are adapted for online benchmarking of procedural mistake detection, defining the Assembly101-O and Epic-tent-O datasets. The code is available at https://github.com/aleflabo/PREGO.
PREGO's architecture is dual-branched, with the first branch analyzing frames in a procedural video up to a current time t to classify the action being undertaken by the operator. The second branch predicts the action at time t based solely on the steps up to t-1. A pre-trained Large Language Model (LLM) is used for zero-shot symbolic reasoning through contextual analysis. An error is detected upon a misalignment between the currently recognized action and the anticipated one.
PREGO achieves open-setness by relying exclusively on correct procedural sequences during training, following the One-Class Classification (OCC) paradigm. This allows PREGO to identify a wide range of procedural mistakes without being confined to a predefined set of errors. PREGO's approach is an abstraction from the video content, using labels for longer-term reasoning and as an alternative to carefully constructed action inter-dependency graphs.
To evaluate PREGO, the authors introduce the novel task of online procedural mistake detection and rearrange existing datasets to provide two new benchmarks, referred to as Assembly101-O and Epic-tent-O. The evaluation metrics include precision, recall, and F1 score. PREGO outperforms several baselines, including One-step memory, BERT, and OadTR, achieving a significant improvement in F1-score. PREGO's symbolic reasoning allows it to operate at a higher level of abstraction than video-based methods, mitigating challenges with occlusion and forecasting fine-grained actions. The results show that PREGO can better learn the normal patterns of the procedures and detect deviations from them, achieving the best results in terms of F1-score.