[slides] Robust Light-Weight Facial Affective Behavior Recognition with CLIP

The paper introduces a lightweight framework for robust light-weight facial affective behavior recognition using CLIP (Contrastive Language-Image Pre-training). The framework combines a frozen CLIP image encoder and a trainable 3-layer multilayer perceptron (MLP) with Conditional Value at Risk (CVar) for robustness and loss landscape flattening for improved generalization. This approach is designed to efficiently handle both expression classification and Action Unit (AU) detection, addressing the limitations of existing methods that often require complex models and substantial computational resources. The framework is evaluated on the Aff-Wild2 dataset, demonstrating superior performance compared to the baseline while maintaining minimal computational demands. The code for this framework is available on GitHub. The contributions of the work include the first lightweight framework for efficient affective behavior analysis, the integration of CVar into loss functions to enhance accuracy, and the method's superior performance in both expression classification and AU detection tasks.The paper introduces a lightweight framework for robust light-weight facial affective behavior recognition using CLIP (Contrastive Language-Image Pre-training). The framework combines a frozen CLIP image encoder and a trainable 3-layer multilayer perceptron (MLP) with Conditional Value at Risk (CVar) for robustness and loss landscape flattening for improved generalization. This approach is designed to efficiently handle both expression classification and Action Unit (AU) detection, addressing the limitations of existing methods that often require complex models and substantial computational resources. The framework is evaluated on the Aff-Wild2 dataset, demonstrating superior performance compared to the baseline while maintaining minimal computational demands. The code for this framework is available on GitHub. The contributions of the work include the first lightweight framework for efficient affective behavior analysis, the integration of CVar into loss functions to enhance accuracy, and the method's superior performance in both expression classification and AU detection tasks.

Robust Light-Weight Facial Affective Behavior Recognition with CLIP

8 Sep 2024 | Li Lin, Sarah Papabathini, Xin Wang, Shu Hu