Understanding Intelligent Agents with LLM-based Process Automation

This paper introduces a novel intelligent virtual assistant system based on LLM-based Process Automation (LLMPA), which can automatically perform multi-step operations within mobile applications based on high-level user requests. The system provides an end-to-end solution for parsing instructions, reasoning about goals, and executing actions. LLMPA includes modules for decomposing instructions, generating descriptions, detecting interface elements, predicting next actions, and error checking. Experiments demonstrate the system's ability to complete complex mobile operation tasks in Alipay based on natural language instructions, showcasing how large language models can enable automated assistants to accomplish real-world tasks. The main contributions of this work are the novel LLMPA architecture optimized for app automation, the methodology for applying LLM-based assistants to mobile apps, and demonstrations of multi-step task completion in a real-world environment. Notably, this work represents the first real-world deployment and extensive evaluation of a large language model-based virtual assistant in a widely used mobile application with an enormous user base numbering in the hundreds of millions. The proposed LLMPA system is designed to understand tasks, decompose them, and execute them systematically. It includes modules for instruction chains generation, previous action description generation, object detection, action prediction, and controllable calibration. The system is tested in real-world scenarios, such as booking a flight ticket on Alipay, demonstrating its ability to handle complex tasks through natural language instructions. The system's performance is evaluated in both online and benchmark environments, showing significant improvements in step and task success rates compared to baselines. The results indicate that the integration of instruction chains and previous action descriptions enhances the system's ability to reason and complete tasks efficiently. The system's effectiveness is further validated through case studies, where it successfully handles complex tasks such as redeeming subway discount vouchers in the Alipay membership scenario. The work highlights the potential of large language models in enabling intelligent virtual assistants to perform complex tasks in real-world environments. However, it also acknowledges the challenges and limitations of relying solely on LLMs, including the need for extensive training data and the importance of ethical considerations in the deployment of such systems. The study underscores the importance of continued research in improving the capabilities of intelligent assistants to better understand and execute user instructions.This paper introduces a novel intelligent virtual assistant system based on LLM-based Process Automation (LLMPA), which can automatically perform multi-step operations within mobile applications based on high-level user requests. The system provides an end-to-end solution for parsing instructions, reasoning about goals, and executing actions. LLMPA includes modules for decomposing instructions, generating descriptions, detecting interface elements, predicting next actions, and error checking. Experiments demonstrate the system's ability to complete complex mobile operation tasks in Alipay based on natural language instructions, showcasing how large language models can enable automated assistants to accomplish real-world tasks. The main contributions of this work are the novel LLMPA architecture optimized for app automation, the methodology for applying LLM-based assistants to mobile apps, and demonstrations of multi-step task completion in a real-world environment. Notably, this work represents the first real-world deployment and extensive evaluation of a large language model-based virtual assistant in a widely used mobile application with an enormous user base numbering in the hundreds of millions. The proposed LLMPA system is designed to understand tasks, decompose them, and execute them systematically. It includes modules for instruction chains generation, previous action description generation, object detection, action prediction, and controllable calibration. The system is tested in real-world scenarios, such as booking a flight ticket on Alipay, demonstrating its ability to handle complex tasks through natural language instructions. The system's performance is evaluated in both online and benchmark environments, showing significant improvements in step and task success rates compared to baselines. The results indicate that the integration of instruction chains and previous action descriptions enhances the system's ability to reason and complete tasks efficiently. The system's effectiveness is further validated through case studies, where it successfully handles complex tasks such as redeeming subway discount vouchers in the Alipay membership scenario. The work highlights the potential of large language models in enabling intelligent virtual assistants to perform complex tasks in real-world environments. However, it also acknowledges the challenges and limitations of relying solely on LLMs, including the need for extensive training data and the importance of ethical considerations in the deployment of such systems. The study underscores the importance of continued research in improving the capabilities of intelligent assistants to better understand and execute user instructions.

Intelligent Agents with LLM-based Process Automation

August 25-29, 2024 | Yanchu Guan*, Dong Wang*, Zhixuan Chu*, Shiyu Wang*, Feiyue Ni, Ruihua Song, Chenyi Zhuang

August 25-29, 2024 | Yanchu Guan, Dong Wang, Zhixuan Chu, Shiyu Wang, Feiyue Ni, Ruihua Song, Chenyi Zhuang