May 11-16, 2024 | Ke Li, Ruidong Zhang, Siyuan Chen, Boao Chen, Mose Sakashita, François Guimbretière, Cheng Zhang
EyeEcho is a low-power, minimally-obtrusive acoustic sensing system designed to enable glasses to continuously monitor facial expressions. It uses two pairs of speakers and microphones mounted on glasses to emit inaudible acoustic signals directed toward the face, capturing subtle skin deformations associated with facial expressions. The reflected signals are processed through a customized machine-learning pipeline to estimate full facial movements. EyeEcho samples at 83.3 Hz with a relatively low power consumption of 167mW. A user study involving 12 participants demonstrated that EyeEcho achieves highly accurate tracking performance across different real-world scenarios, including sitting, walking, and after remounting the devices. A semi-in-the-wild study involving 10 participants further validated EyeEcho's performance in naturalistic scenarios while participants engage in various daily activities. EyeEcho can be deployed on a commercial-off-the-shelf (COTS) smartphone, offering real-time facial expression tracking.
EyeEcho uses Frequency Modulated Continuous Wave (FMCW) acoustic signals to capture skin deformations on glasses. The system includes two MEMS microphones, two speakers, and a Bluetooth module. The system processes the reflected signals through a customized convolutional neural network (CNN) to estimate facial expressions represented by 52 blend-shape parameters calculated using Apple's ARKit API. A user study with 12 participants showed that EyeEcho can accurately estimate facial expressions continuously on glasses using only four minutes of training data from each participant. Additionally, it can detect eye blinks with an F1 score of 82%, which has not been shown in any prior work.
EyeEcho is relatively low-power and lightweight compared to camera-based facial expression tracking technologies on wearables. The full sensing system can operate at a sample rate of 83.3Hz with a power signature of 167mW. The system can last for around 14 hours using the battery of Google Glass. The ML algorithm is optimized to be lightweight based on the ResNet-18 architecture, so that it can be deployed on a commodity smartphone for real-time processing. EyeEcho is able to track users' facial expressions continuously at 29Hz in real-time on a commodity Android phone, which was not shown in any of the similar sensing systems.
The key contributions of this paper are: (1) enabling continuous facial expression tracking on glasses using low-power acoustic sensing; (2) conducting studies, including a semi-in-the-wild study, to evaluate EyeEcho in estimating facial expressions including eye blinks in both lab and real-world settings; (3) developing a real-time processing system on an Android phone and demonstrating promising performance.EyeEcho is a low-power, minimally-obtrusive acoustic sensing system designed to enable glasses to continuously monitor facial expressions. It uses two pairs of speakers and microphones mounted on glasses to emit inaudible acoustic signals directed toward the face, capturing subtle skin deformations associated with facial expressions. The reflected signals are processed through a customized machine-learning pipeline to estimate full facial movements. EyeEcho samples at 83.3 Hz with a relatively low power consumption of 167mW. A user study involving 12 participants demonstrated that EyeEcho achieves highly accurate tracking performance across different real-world scenarios, including sitting, walking, and after remounting the devices. A semi-in-the-wild study involving 10 participants further validated EyeEcho's performance in naturalistic scenarios while participants engage in various daily activities. EyeEcho can be deployed on a commercial-off-the-shelf (COTS) smartphone, offering real-time facial expression tracking.
EyeEcho uses Frequency Modulated Continuous Wave (FMCW) acoustic signals to capture skin deformations on glasses. The system includes two MEMS microphones, two speakers, and a Bluetooth module. The system processes the reflected signals through a customized convolutional neural network (CNN) to estimate facial expressions represented by 52 blend-shape parameters calculated using Apple's ARKit API. A user study with 12 participants showed that EyeEcho can accurately estimate facial expressions continuously on glasses using only four minutes of training data from each participant. Additionally, it can detect eye blinks with an F1 score of 82%, which has not been shown in any prior work.
EyeEcho is relatively low-power and lightweight compared to camera-based facial expression tracking technologies on wearables. The full sensing system can operate at a sample rate of 83.3Hz with a power signature of 167mW. The system can last for around 14 hours using the battery of Google Glass. The ML algorithm is optimized to be lightweight based on the ResNet-18 architecture, so that it can be deployed on a commodity smartphone for real-time processing. EyeEcho is able to track users' facial expressions continuously at 29Hz in real-time on a commodity Android phone, which was not shown in any of the similar sensing systems.
The key contributions of this paper are: (1) enabling continuous facial expression tracking on glasses using low-power acoustic sensing; (2) conducting studies, including a semi-in-the-wild study, to evaluate EyeEcho in estimating facial expressions including eye blinks in both lab and real-world settings; (3) developing a real-time processing system on an Android phone and demonstrating promising performance.