Understanding Facial Affective Behavior Analysis with Instruction Tuning

Facial Affective Behavior Analysis (FABA) aims to recognize facial expressions and movements to understand emotional states and intentions. Traditional methods focus on discrete emotion categories, lacking granularity and reasoning. Multi-modal Large Language Models (MLLMs) show promise in visual tasks but face challenges in FABA due to limited datasets, lack of facial prior knowledge, and training inefficiency. To address these, the authors introduce FABA-Instruct, an instruction-following dataset with 19K aligned face images and 30K fine-grained annotations. They also propose FABA-Bench, a benchmark evaluating both recognition and generation abilities. EmoLA, an efficient MLLM with a facial prior expert and low-rank adaptation, is introduced as a strong baseline. Experiments on FABA-Bench and four datasets show EmoLA outperforms existing methods, demonstrating its effectiveness in FABA tasks. The dataset and code are available. Key contributions include instruction-following FABA data, a new benchmark, and an MLLM-based FABA architecture. The study highlights the importance of fine-grained facial movement, interpretability, and reasoning in FABA. The results show that EmoLA achieves the best performance on FABA-Bench and is competitive with state-of-the-art models on traditional datasets. The work advances FABA research by leveraging MLLMs and facial prior knowledge.Facial Affective Behavior Analysis (FABA) aims to recognize facial expressions and movements to understand emotional states and intentions. Traditional methods focus on discrete emotion categories, lacking granularity and reasoning. Multi-modal Large Language Models (MLLMs) show promise in visual tasks but face challenges in FABA due to limited datasets, lack of facial prior knowledge, and training inefficiency. To address these, the authors introduce FABA-Instruct, an instruction-following dataset with 19K aligned face images and 30K fine-grained annotations. They also propose FABA-Bench, a benchmark evaluating both recognition and generation abilities. EmoLA, an efficient MLLM with a facial prior expert and low-rank adaptation, is introduced as a strong baseline. Experiments on FABA-Bench and four datasets show EmoLA outperforms existing methods, demonstrating its effectiveness in FABA tasks. The dataset and code are available. Key contributions include instruction-following FABA data, a new benchmark, and an MLLM-based FABA architecture. The study highlights the importance of fine-grained facial movement, interpretability, and reasoning in FABA. The results show that EmoLA achieves the best performance on FABA-Bench and is competitive with state-of-the-art models on traditional datasets. The work advances FABA research by leveraging MLLMs and facial prior knowledge.

Facial Affective Behavior Analysis with Instruction Tuning

2024-07-12 | Yifan Li¹, Anh Dao¹, Wentao Bao¹, Zhen Tan², Tianlong Chen³,⁴,⁵, Huan Liu², and Yu Kong¹