OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

OS-Copilot: Towards Generalist Computer Agents with Self-Improvement

15 Feb 2024 | Zhiyong Wu, Chengcheng Han, Zichen Ding, Zhenmin Weng, Zhourmanze Liu, Shunyu Yao, Tao Yu, Lingpeng Kong
OS-Copilot is a framework for building generalist computer agents capable of interacting with a wide range of operating system elements, including web, code terminals, files, multimedia, and third-party applications. The framework enables the creation of FRIDAY, a self-improving embodied agent that automates general computer tasks. FRIDAY outperforms previous methods by 35% on GAIA, a benchmark for general AI assistants, demonstrating strong generalization to unseen applications. FRIDAY also learns to control and self-improve on Excel and PowerPoint with minimal supervision. The OS-Copilot framework provides infrastructure and insights for future research toward more capable and general-purpose computer agents. FRIDAY is designed to maximize generality through self-refinement and self-directed learning. It can autonomously generate tools for unfamiliar applications and solve tasks through trial and error. FRIDAY's performance on GAIA shows a 40.86% success rate in level-1 tasks, significantly outperforming previous systems. FRIDAY also achieves a 6.12% success rate in level-3 tasks, previously unsolvable by other systems. FRIDAY's self-directed learning ability is demonstrated on a spreadsheet manipulation dataset, where it achieves a 60% success rate after learning. FRIDAY's performance highlights the effectiveness of its configurator and its ability to learn and control unfamiliar applications. The framework includes a planner, configurator, and actor, which work together to execute tasks and self-criticize for improvement. FRIDAY's design allows it to handle a wide range of tasks, including creating a PowerPoint slide, manipulating Excel, and building a website. The framework's components, including the planner, configurator, and actor, enable FRIDAY to perform complex tasks and adapt to new environments. The framework's evaluation results and case studies demonstrate its potential as a helpful OS assistant. The framework's components, including the planner, configurator, and actor, enable FRIDAY to perform complex tasks and adapt to new environments. The framework's evaluation results and case studies demonstrate its potential as a helpful OS assistant.OS-Copilot is a framework for building generalist computer agents capable of interacting with a wide range of operating system elements, including web, code terminals, files, multimedia, and third-party applications. The framework enables the creation of FRIDAY, a self-improving embodied agent that automates general computer tasks. FRIDAY outperforms previous methods by 35% on GAIA, a benchmark for general AI assistants, demonstrating strong generalization to unseen applications. FRIDAY also learns to control and self-improve on Excel and PowerPoint with minimal supervision. The OS-Copilot framework provides infrastructure and insights for future research toward more capable and general-purpose computer agents. FRIDAY is designed to maximize generality through self-refinement and self-directed learning. It can autonomously generate tools for unfamiliar applications and solve tasks through trial and error. FRIDAY's performance on GAIA shows a 40.86% success rate in level-1 tasks, significantly outperforming previous systems. FRIDAY also achieves a 6.12% success rate in level-3 tasks, previously unsolvable by other systems. FRIDAY's self-directed learning ability is demonstrated on a spreadsheet manipulation dataset, where it achieves a 60% success rate after learning. FRIDAY's performance highlights the effectiveness of its configurator and its ability to learn and control unfamiliar applications. The framework includes a planner, configurator, and actor, which work together to execute tasks and self-criticize for improvement. FRIDAY's design allows it to handle a wide range of tasks, including creating a PowerPoint slide, manipulating Excel, and building a website. The framework's components, including the planner, configurator, and actor, enable FRIDAY to perform complex tasks and adapt to new environments. The framework's evaluation results and case studies demonstrate its potential as a helpful OS assistant. The framework's components, including the planner, configurator, and actor, enable FRIDAY to perform complex tasks and adapt to new environments. The framework's evaluation results and case studies demonstrate its potential as a helpful OS assistant.
Reach us at info@study.space