[slides and audio] GOMA%3A Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment

GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment This paper introduces GOMA, a novel cooperative communication framework that enables an embodied AI assistant to communicate with humans to achieve optimal cooperation. GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the parts of agents' mental states relevant to the goals. The framework allows the assistant to reason about when and how to proactively initiate communication with humans using natural language to enhance cooperation. The approach is evaluated in two challenging environments: Overcooked (a multiplayer game) and VirtualHome (a household simulator). Experimental results show that GOMA outperforms strong baselines, including a recent LLM-based baseline, in terms of task performance and human perception of the assistant. GOMA enables the assistant to generate concise verbal communication that effectively boosts cooperation and improves human users' perception of the assistant. The key insight of GOMA is that only the part of the belief relevant to reaching the goal needs to be aligned. This allows the assistant to generate optimal communication in the belief space, reshaping agents' beliefs through verbal communication. The framework uses a two-level I-POMDP to model the mental reasoning between a human user and a robot assistant. It defines the mind of each agent as the belief of the level-1 interactive state of the agent. GOMA is evaluated in two human-AI cooperation domains, Overcooked and VirtualHome. The results show that GOMA outperforms strong baselines in both domains. The GOMA-enabled assistant also receives higher subjective ratings from human participants. The contributions include a novel embodied cooperative communication framework – GOMA, extensive evaluation of strong baselines and GOMA in two challenging domains, and a human user study that evaluates the task performance of AI assistants and humans' perception of them. The paper also discusses related work in communication in collaboration, collaborative and communicative AI agents, and theory of mind for cooperative robot planning. It highlights the challenges of enabling robots to actively initiate verbal communication that is both concise and consistent with the physical environment and the social context. The paper concludes that GOMA enables an embodied AI assistant to efficiently and effectively communicate with a human user to achieve optimal cooperation.GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment This paper introduces GOMA, a novel cooperative communication framework that enables an embodied AI assistant to communicate with humans to achieve optimal cooperation. GOMA formulates verbal communication as a planning problem that minimizes the misalignment between the parts of agents' mental states relevant to the goals. The framework allows the assistant to reason about when and how to proactively initiate communication with humans using natural language to enhance cooperation. The approach is evaluated in two challenging environments: Overcooked (a multiplayer game) and VirtualHome (a household simulator). Experimental results show that GOMA outperforms strong baselines, including a recent LLM-based baseline, in terms of task performance and human perception of the assistant. GOMA enables the assistant to generate concise verbal communication that effectively boosts cooperation and improves human users' perception of the assistant. The key insight of GOMA is that only the part of the belief relevant to reaching the goal needs to be aligned. This allows the assistant to generate optimal communication in the belief space, reshaping agents' beliefs through verbal communication. The framework uses a two-level I-POMDP to model the mental reasoning between a human user and a robot assistant. It defines the mind of each agent as the belief of the level-1 interactive state of the agent. GOMA is evaluated in two human-AI cooperation domains, Overcooked and VirtualHome. The results show that GOMA outperforms strong baselines in both domains. The GOMA-enabled assistant also receives higher subjective ratings from human participants. The contributions include a novel embodied cooperative communication framework – GOMA, extensive evaluation of strong baselines and GOMA in two challenging domains, and a human user study that evaluates the task performance of AI assistants and humans' perception of them. The paper also discusses related work in communication in collaboration, collaborative and communicative AI agents, and theory of mind for cooperative robot planning. It highlights the challenges of enabling robots to actively initiate verbal communication that is both concise and consistent with the physical environment and the social context. The paper concludes that GOMA enables an embodied AI assistant to efficiently and effectively communicate with a human user to achieve optimal cooperation.

GOMA: Proactive Embodied Cooperative Communication via Goal-Oriented Mental Alignment

17 Mar 2024 | Lance Ying, Kunal Jha, Shivam Aaryya, Joshua B. Tenenbaum, Antonio Torralba, Tianmin Shu