Intelligent Agents with LLM-based Process Automation

Intelligent Agents with LLM-based Process Automation

August 25–29, 2024, Barcelona, Spain | Yanchu Guan*, Dong Wang*, Zhixuan Chu*,† Shiyu Wang*, Feiyue Ni, Ruihua Song, Chenyi Zhuang
This paper presents a novel approach to intelligent virtual assistants using large language models (LLMs) for mobile app automation. The proposed system, called LLM-Based Process Automation (LLMPA), is designed to automatically perform multi-step operations within mobile apps based on high-level user requests. Unlike traditional virtual assistants that rely on fixed programmatic functions, LLMPA emulates detailed human interactions, enabling more flexible and complex task execution. The system includes modules for decomposing instructions, generating descriptions, detecting interface elements, predicting next actions, and error checking. Experiments demonstrate the system's ability to complete complex mobile operations tasks in Alipay based on natural language instructions, showcasing the potential of LLMs in enabling automated assistants to handle real-world tasks. The main contributions include the novel LLMPA architecture, the methodology for applying LLMs to mobile apps, and the successful deployment and evaluation of a large language model-based virtual assistant in a widely used mobile application with a massive user base. The paper also discusses the advantages and limitations of the approach, emphasizing the need for further research in contextual processing, reasoning capabilities, and optimized on-device deployment.This paper presents a novel approach to intelligent virtual assistants using large language models (LLMs) for mobile app automation. The proposed system, called LLM-Based Process Automation (LLMPA), is designed to automatically perform multi-step operations within mobile apps based on high-level user requests. Unlike traditional virtual assistants that rely on fixed programmatic functions, LLMPA emulates detailed human interactions, enabling more flexible and complex task execution. The system includes modules for decomposing instructions, generating descriptions, detecting interface elements, predicting next actions, and error checking. Experiments demonstrate the system's ability to complete complex mobile operations tasks in Alipay based on natural language instructions, showcasing the potential of LLMs in enabling automated assistants to handle real-world tasks. The main contributions include the novel LLMPA architecture, the methodology for applying LLMs to mobile apps, and the successful deployment and evaluation of a large language model-based virtual assistant in a widely used mobile application with a massive user base. The paper also discusses the advantages and limitations of the approach, emphasizing the need for further research in contextual processing, reasoning capabilities, and optimized on-device deployment.
Reach us at info@study.space