May 11–16, 2024 | Xingyu Bruce Liu, Jiahao Nick Li, David Kim, Xiang 'Anthony' Chen, Ruofei Du
The paper introduces Human I/O, a unified approach to detecting Situational Impaired Users (SIIDs) by assessing the availability of human input/output channels. SIIDs, such as those caused by poor lighting, noise, or multi-tasking, can significantly impact user experience. Human I/O leverages egocentric video and audio streams, computer vision, audio analysis, and large language models (LLMs) to predict the availability of vision, hearing, vocal, and hands channels. The system achieves a mean absolute error of 0.22 and an accuracy of 82% across 60 in-the-wild egocentric video recordings in 32 different scenarios. A user study with 10 participants further demonstrates that Human I/O reduces effort and improves user experience in the presence of SIIDs. The paper also discusses the limitations and future directions, including the need for more diverse few-shot examples and improved activity recognition, particularly for hand-related tasks.The paper introduces Human I/O, a unified approach to detecting Situational Impaired Users (SIIDs) by assessing the availability of human input/output channels. SIIDs, such as those caused by poor lighting, noise, or multi-tasking, can significantly impact user experience. Human I/O leverages egocentric video and audio streams, computer vision, audio analysis, and large language models (LLMs) to predict the availability of vision, hearing, vocal, and hands channels. The system achieves a mean absolute error of 0.22 and an accuracy of 82% across 60 in-the-wild egocentric video recordings in 32 different scenarios. A user study with 10 participants further demonstrates that Human I/O reduces effort and improves user experience in the presence of SIIDs. The paper also discusses the limitations and future directions, including the need for more diverse few-shot examples and improved activity recognition, particularly for hand-related tasks.