AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent

4 Apr 2024 | Hanyu Lai, Xiao Liu, Iat Long Iong, Shuntian Yao, Yuxuan Chen, Pengbo Shen, Hao Yu, Hanchen Zhang, Xiaohan Zhang, Yuxiao Dong, Jie Tang
AutoWebGLM is a large language model-based web navigation agent that outperforms GPT-4. It is built on the ChatGLM3-6B model and designed to autonomously complete complex real-world tasks by navigating and operating on real web browsers. The agent uses an HTML simplification algorithm to represent webpages, preserving vital information succinctly. A hybrid human-AI method is employed to build web browsing data for curriculum training. The model is then bootstrapped using reinforcement learning and rejection sampling to enhance webpage comprehension, browser operations, and efficient task decomposition. A bilingual benchmark, AutoWebBench, is established for real-world web browsing tasks. AUTOWEBGLM is evaluated across diverse web navigation benchmarks, revealing its improvements but also underlying challenges in real environments. The agent is implemented as a Chrome extension and can perform operations on various websites to complete user tasks accurately. A first bilingual webpage browsing evaluation dataset is constructed, considering regional stylistic variations. The system architecture includes a browsing framework and LM agent, with the browsing framework organizing concise HTML and other information for the LM agent to make decisions. The LM agent learns from data procured from diverse sources and employs RL and RFT to bootstrap itself, enhancing web browsing capabilities. The agent is trained using curriculum learning, reinforcement learning, and rejection sampling finetuning. It is evaluated on various benchmarks, demonstrating performance comparable to the most advanced LLM-based agents. The agent's success in completing tasks is attributed to its ability to handle complex web operations and adapt to different environments. The system's contributions include the development of AUTOWEBGLM, the construction of a real webpage browsing operation dataset, and the demonstration of the agent's practical usability for real-world web tasks.AutoWebGLM is a large language model-based web navigation agent that outperforms GPT-4. It is built on the ChatGLM3-6B model and designed to autonomously complete complex real-world tasks by navigating and operating on real web browsers. The agent uses an HTML simplification algorithm to represent webpages, preserving vital information succinctly. A hybrid human-AI method is employed to build web browsing data for curriculum training. The model is then bootstrapped using reinforcement learning and rejection sampling to enhance webpage comprehension, browser operations, and efficient task decomposition. A bilingual benchmark, AutoWebBench, is established for real-world web browsing tasks. AUTOWEBGLM is evaluated across diverse web navigation benchmarks, revealing its improvements but also underlying challenges in real environments. The agent is implemented as a Chrome extension and can perform operations on various websites to complete user tasks accurately. A first bilingual webpage browsing evaluation dataset is constructed, considering regional stylistic variations. The system architecture includes a browsing framework and LM agent, with the browsing framework organizing concise HTML and other information for the LM agent to make decisions. The LM agent learns from data procured from diverse sources and employs RL and RFT to bootstrap itself, enhancing web browsing capabilities. The agent is trained using curriculum learning, reinforcement learning, and rejection sampling finetuning. It is evaluated on various benchmarks, demonstrating performance comparable to the most advanced LLM-based agents. The agent's success in completing tasks is attributed to its ability to handle complex web operations and adapt to different environments. The system's contributions include the development of AUTOWEBGLM, the construction of a real webpage browsing operation dataset, and the demonstration of the agent's practical usability for real-world web tasks.
Reach us at info@study.space