DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

14 Jun 2024 | Hao Bai, Yifei Zhou, Mert Cemri, Jiayi Pan, Alane Suhr, Sergey Levine, Aviral Kumar
The paper introduces DigiRL, a novel autonomous reinforcement learning (RL) approach for training device-control agents in real-world environments. The authors address the challenge of training vision-language models (VLMs) for decision-making tasks, such as controlling graphical user interfaces (GUIs), by leveraging autonomous RL. DigiRL consists of two stages: offline RL to initialize the model using static demonstrations, followed by offline-to-online RL to fine-tune the model on real-world data. The authors build a scalable Android learning environment with a VLM-based evaluator and develop an RL approach that incorporates advantage-weighted regression (AWR) with an automatic curriculum to handle stochasticity and non-stationarity. The effectiveness of DigiRL is demonstrated using the Android-in-the-Wild (AiW) dataset, achieving a 49.5% absolute improvement in success rate over supervised fine-tuning with static human demonstrations. The results surpass previous state-of-the-art agents, including AppAgent with GPT-4V and CogAgent, establishing a new benchmark for digital agents in device control tasks.The paper introduces DigiRL, a novel autonomous reinforcement learning (RL) approach for training device-control agents in real-world environments. The authors address the challenge of training vision-language models (VLMs) for decision-making tasks, such as controlling graphical user interfaces (GUIs), by leveraging autonomous RL. DigiRL consists of two stages: offline RL to initialize the model using static demonstrations, followed by offline-to-online RL to fine-tune the model on real-world data. The authors build a scalable Android learning environment with a VLM-based evaluator and develop an RL approach that incorporates advantage-weighted regression (AWR) with an automatic curriculum to handle stochasticity and non-stationarity. The effectiveness of DigiRL is demonstrated using the Android-in-the-Wild (AiW) dataset, achieving a 49.5% absolute improvement in success rate over supervised fine-tuning with static human demonstrations. The results surpass previous state-of-the-art agents, including AppAgent with GPT-4V and CogAgent, establishing a new benchmark for digital agents in device control tasks.
Reach us at info@study.space