GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices

12 Jun 2024 | Quanfeng Lu, Wenqi Shao, Zitao Liu, Fanqing Meng, Boxuan Li, Botong Chen, Siyuan Huang, Kaipeng Zhang, Yu Qiao, Ping Luo
The paper introduces GUI Odyssey, a comprehensive dataset designed for training and evaluating cross-app navigation agents on mobile devices. The dataset includes 7,735 episodes from 6 mobile devices, covering 201 apps and 1,399 app combos, spanning 6 types of cross-app tasks. To address the limitations of existing GUI agents, which often struggle with complex cross-app tasks, the authors developed OdysseyAgent, a multimodal cross-app navigation agent based on the Qwen-VL model with a history resampling module. Extensive experiments show that OdysseyAgent outperforms existing models in both in-domain and out-of-domain settings, achieving superior accuracy in cross-app navigation. The dataset and code are available at https://github.com/OpenGVLab/GUI-Odyssey. The work aims to advance the field of general GUI agents and improve user experiences in communication, entertainment, and productivity.The paper introduces GUI Odyssey, a comprehensive dataset designed for training and evaluating cross-app navigation agents on mobile devices. The dataset includes 7,735 episodes from 6 mobile devices, covering 201 apps and 1,399 app combos, spanning 6 types of cross-app tasks. To address the limitations of existing GUI agents, which often struggle with complex cross-app tasks, the authors developed OdysseyAgent, a multimodal cross-app navigation agent based on the Qwen-VL model with a history resampling module. Extensive experiments show that OdysseyAgent outperforms existing models in both in-domain and out-of-domain settings, achieving superior accuracy in cross-app navigation. The dataset and code are available at https://github.com/OpenGVLab/GUI-Odyssey. The work aims to advance the field of general GUI agents and improve user experiences in communication, entertainment, and productivity.
Reach us at info@study.space
[slides and audio] GUI Odyssey%3A A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices