May 11–16, 2024 | Xingyu Bruce Liu, Jiahao Nick Li, David Kim, Xiang 'Anthony' Chen, Ruofei Du
This paper introduces Human I/O, a unified approach to detecting Situational Impairments and Disabilities (SIIDs) by assessing the availability of human input/output channels. SIIDs, such as poor lighting, noise, and multi-tasking, can significantly hinder user experience. Existing systems often focus on specific tasks or environments, failing to address the dynamic and diverse nature of SIIDs. Human I/O leverages egocentric vision, multimodal sensing, and large language models (LLMs) to predict channel availability with a mean absolute error of 0.22 and 82% accuracy across 60 real-world scenarios. The system captures egocentric video and audio, processes input data, and uses LLMs to predict channel availability. A user study with 10 participants showed that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs. The system is deployed and open-sourced at https://github.com/google/humanio. The paper also presents a formative study that informs the design of Human I/O, highlighting the need for integrating contextual cues and a four-level scale for measuring channel availability. The system's design and implementation are detailed, along with a technical evaluation on 60 in-the-wild egocentric videos and a user study demonstrating its potential in improving user experience. The paper also discusses related work in situationally aware computing, egocentric vision, reasoning capabilities of LLMs, and activity and environmental sensing. The findings suggest that Human I/O provides a more unified approach to detecting SIIDs by focusing on the limited availability of human input/output channels. The system's performance is evaluated using quantitative metrics, including mean absolute error, classification accuracy, and intra-video variance. The results show that Human I/O achieves high accuracy and consistency in predicting channel availability. The paper also discusses limitations, such as the system's performance in predicting hand availability and the need for further research on pre-impairment scenarios. Future work could explore temporal segmentation techniques and lighter weight models to improve system performance.This paper introduces Human I/O, a unified approach to detecting Situational Impairments and Disabilities (SIIDs) by assessing the availability of human input/output channels. SIIDs, such as poor lighting, noise, and multi-tasking, can significantly hinder user experience. Existing systems often focus on specific tasks or environments, failing to address the dynamic and diverse nature of SIIDs. Human I/O leverages egocentric vision, multimodal sensing, and large language models (LLMs) to predict channel availability with a mean absolute error of 0.22 and 82% accuracy across 60 real-world scenarios. The system captures egocentric video and audio, processes input data, and uses LLMs to predict channel availability. A user study with 10 participants showed that Human I/O significantly reduces effort and improves user experience in the presence of SIIDs. The system is deployed and open-sourced at https://github.com/google/humanio. The paper also presents a formative study that informs the design of Human I/O, highlighting the need for integrating contextual cues and a four-level scale for measuring channel availability. The system's design and implementation are detailed, along with a technical evaluation on 60 in-the-wild egocentric videos and a user study demonstrating its potential in improving user experience. The paper also discusses related work in situationally aware computing, egocentric vision, reasoning capabilities of LLMs, and activity and environmental sensing. The findings suggest that Human I/O provides a more unified approach to detecting SIIDs by focusing on the limited availability of human input/output channels. The system's performance is evaluated using quantitative metrics, including mean absolute error, classification accuracy, and intra-video variance. The results show that Human I/O achieves high accuracy and consistency in predicting channel availability. The paper also discusses limitations, such as the system's performance in predicting hand availability and the need for further research on pre-impairment scenarios. Future work could explore temporal segmentation techniques and lighter weight models to improve system performance.