SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

23 Feb 2024 | Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu
The paper introduces *SeeClick*, a novel visual GUI agent designed to automate complex tasks on digital devices using only screenshots as input. The primary challenge in developing such agents is GUI grounding, the ability to accurately locate screen elements based on instructions. To address this, *SeeClick* is enhanced with GUI grounding pre-training and a method to automate the curation of GUI grounding data. The authors also create *ScreenSpot*, a realistic GUI grounding benchmark that includes over 600 screenshots and 1200 instructions from various GUI platforms. Evaluations on *ScreenSpot* and three widely used benchmarks show that *SeeClick* outperforms existing models, demonstrating the effectiveness of GUI grounding pre-training. The paper concludes by discussing the limitations and ethical considerations of GUI agents, emphasizing the importance of privacy, safety, and bias mitigation.The paper introduces *SeeClick*, a novel visual GUI agent designed to automate complex tasks on digital devices using only screenshots as input. The primary challenge in developing such agents is GUI grounding, the ability to accurately locate screen elements based on instructions. To address this, *SeeClick* is enhanced with GUI grounding pre-training and a method to automate the curation of GUI grounding data. The authors also create *ScreenSpot*, a realistic GUI grounding benchmark that includes over 600 screenshots and 1200 instructions from various GUI platforms. Evaluations on *ScreenSpot* and three widely used benchmarks show that *SeeClick* outperforms existing models, demonstrating the effectiveness of GUI grounding pre-training. The paper concludes by discussing the limitations and ethical considerations of GUI agents, emphasizing the importance of privacy, safety, and bias mitigation.
Reach us at info@study.space