May 11–16, 2024 | Razan Jaber, Sabrina Zhong, Sanna Kuoppamäki, Aida Hosseini, Iona Gessinger, Duncan P Brumby, Benjamin R. Cowan, Donald McMillan
This paper explores the design of context-aware voice interactions for complex tasks, particularly cooking. The authors conducted two studies to evaluate the effectiveness of a context-aware voice agent (VA) in assisting users during cooking. The first study involved 7 cooking sessions with a commercial VA (Google Assistant), revealing challenges such as irrelevant responses, misinterpretation of requests, and information overload due to a lack of contextual awareness. The second study used a wizard-led context-aware VA, which demonstrated more fluent and complex interactions, including explicit grounding within utterances and social responses. The paper discusses the benefits of limited context awareness in VAs, the potential for personalization, and the division of labor in VA communication. It also highlights the importance of generative models and multi-modal approaches in improving conversational interaction. The findings suggest that context-aware VAs can enhance user experience by providing more natural and proactive support, while also addressing the challenges of balancing agency and user needs.This paper explores the design of context-aware voice interactions for complex tasks, particularly cooking. The authors conducted two studies to evaluate the effectiveness of a context-aware voice agent (VA) in assisting users during cooking. The first study involved 7 cooking sessions with a commercial VA (Google Assistant), revealing challenges such as irrelevant responses, misinterpretation of requests, and information overload due to a lack of contextual awareness. The second study used a wizard-led context-aware VA, which demonstrated more fluent and complex interactions, including explicit grounding within utterances and social responses. The paper discusses the benefits of limited context awareness in VAs, the potential for personalization, and the division of labor in VA communication. It also highlights the importance of generative models and multi-modal approaches in improving conversational interaction. The findings suggest that context-aware VAs can enhance user experience by providing more natural and proactive support, while also addressing the challenges of balancing agency and user needs.