15 January 2024 | Jin Pyo Lee, Hanhyeok Jang, Yeonwoo Jang, Hyeonseo Song, Suwoo Lee, Pooi See Lee & Jiyun Kim
This study presents a novel multi-modal human emotion recognition system, the Personalized Skin-Integrated Facial Interface (PSiFI), which combines verbal and non-verbal expression data for real-time emotion recognition. The PSiFI is a self-powered, stretchable, transparent, and personalized device that integrates triboelectric strain and vibration sensors to capture facial strain and vocal vibrations. It is equipped with a data processing circuit for wireless data transfer, enabling real-time emotion recognition. The system employs a convolutional neural network (CNN) based classification technique, which rapidly adapts to an individual's context through transfer learning. The PSiFI was tested in various scenarios, including real-time emotion recognition with and without a mask, and demonstrated a high accuracy of 93.3% in combined verbal/non-verbal expression recognition. The device was also used in a digital concierge application within a VR environment, where it recognized user emotions and provided personalized services. The PSiFI offers a promising solution for collecting emotional speech data with barrier-free communication and can contribute to various practical applications such as education, marketing, and advertising. The device's unique features include its ability to capture multi-modal data, its high sensitivity, fast response time, and high stretchability, making it suitable for real-time emotion recognition in various environments. The study highlights the potential of the PSiFI in enhancing human-machine interaction by accurately encoding emotional information.This study presents a novel multi-modal human emotion recognition system, the Personalized Skin-Integrated Facial Interface (PSiFI), which combines verbal and non-verbal expression data for real-time emotion recognition. The PSiFI is a self-powered, stretchable, transparent, and personalized device that integrates triboelectric strain and vibration sensors to capture facial strain and vocal vibrations. It is equipped with a data processing circuit for wireless data transfer, enabling real-time emotion recognition. The system employs a convolutional neural network (CNN) based classification technique, which rapidly adapts to an individual's context through transfer learning. The PSiFI was tested in various scenarios, including real-time emotion recognition with and without a mask, and demonstrated a high accuracy of 93.3% in combined verbal/non-verbal expression recognition. The device was also used in a digital concierge application within a VR environment, where it recognized user emotions and provided personalized services. The PSiFI offers a promising solution for collecting emotional speech data with barrier-free communication and can contribute to various practical applications such as education, marketing, and advertising. The device's unique features include its ability to capture multi-modal data, its high sensitivity, fast response time, and high stretchability, making it suitable for real-time emotion recognition in various environments. The study highlights the potential of the PSiFI in enhancing human-machine interaction by accurately encoding emotional information.