UFO: A UI-Focused Agent for Windows OS Interaction

UFO: A UI-Focused Agent for Windows OS Interaction

23 May 2024 | Chaoyun Zhang, Liquan Li, Shilin He, Xu Zhang, Bo Qiao, Si Qin, Minghua Ma, Yu Kang, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang & Qi Zhang
UFO is a UI-focused agent designed for seamless interaction with the Windows OS, leveraging GPT-Vision to fulfill user requests through natural language commands. It employs a dual-agent framework, consisting of HostAgent and AppAgent, to analyze and operate on Windows applications. HostAgent selects the appropriate application and formulates a global plan, while AppAgent executes actions on the selected application. The system includes a control interaction module that translates actions into grounded execution, enabling full automation. UFO can handle tasks spanning multiple applications and is highly extensible, allowing users to customize actions and controls for specific tasks. Testing across 9 popular Windows applications demonstrated UFO's effectiveness, with a 86% success rate and high completion and safeguard rates. UFO outperforms existing baselines like GPT-3.5 and GPT-4, showcasing its capability to complete complex tasks efficiently and securely. Case studies illustrate UFO's ability to handle tasks such as removing notes from a PowerPoint presentation and composing emails with information from multiple applications. UFO's design emphasizes adaptability, safety, and automation, making it a versatile and powerful agent for Windows OS interactions.UFO is a UI-focused agent designed for seamless interaction with the Windows OS, leveraging GPT-Vision to fulfill user requests through natural language commands. It employs a dual-agent framework, consisting of HostAgent and AppAgent, to analyze and operate on Windows applications. HostAgent selects the appropriate application and formulates a global plan, while AppAgent executes actions on the selected application. The system includes a control interaction module that translates actions into grounded execution, enabling full automation. UFO can handle tasks spanning multiple applications and is highly extensible, allowing users to customize actions and controls for specific tasks. Testing across 9 popular Windows applications demonstrated UFO's effectiveness, with a 86% success rate and high completion and safeguard rates. UFO outperforms existing baselines like GPT-3.5 and GPT-4, showcasing its capability to complete complex tasks efficiently and securely. Case studies illustrate UFO's ability to handle tasks such as removing notes from a PowerPoint presentation and composing emails with information from multiple applications. UFO's design emphasizes adaptability, safety, and automation, making it a versatile and powerful agent for Windows OS interactions.
Reach us at info@study.space