Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation

January 2024 | Zipeng Fu, Tony Z. Zhao, Chelsea Finn
Mobile ALOHA is a low-cost, bimanual mobile manipulation system that supports whole-body teleoperation. It costs $32k, including onboard power and compute. The system allows users to teleoperate to obtain food from the fridge and perform complex long-horizon tasks using imitation learning. The system extends the original ALOHA setup with a mobile base and whole-body teleoperation interface. Using data collected with Mobile ALOHA, supervised behavior cloning is performed, and co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex tasks such as sautéing and serving shrimp, opening a two-door cabinet to store heavy cooking pots, calling and entering an elevator, and rinsing a used pan using a kitchen faucet. The main contribution of this paper is a system for learning complex mobile bimanual manipulation tasks. Core to this system is both (1) Mobile ALOHA, a low-cost whole-body teleoperation system, and (2) the finding that a simple co-training recipe enables data-efficient learning of complex mobile manipulation tasks. Our teleoperation system is capable of multiple hours of consecutive usage, such as cooking a 3-course meal, cleaning a public bathroom, and doing laundry. Our imitation learning result also holds across a wide range of complex tasks such as opening a two-door wall cabinet to store heavy cooking pots, calling an elevator, pushing in chairs, and cleaning up spilled wine. With co-training, we are able to achieve over 80% success on these tasks with only 50 human demonstrations per task, with an average of 34% absolute improvement compared to no co-training. Mobile ALOHA is a low-cost mobile manipulator that can perform a broad range of household tasks. It inherits the benefits of the original ALOHA system, i.e., the low-cost, dexterous, and repairable bimanual teleoperation setup, while extending its capabilities beyond table-top manipulation. The system can move at a speed comparable to human walking, around 1.42m/s. It is stable when manipulating heavy household objects, such as pots and cabinets. It supports whole-body teleoperation, including both arms and the mobile base. It is untethered, with onboard power and compute. The system uses a Tracer mobile base, which is a low-profile, differential drive mobile base designed for warehouse logistics. It can move up to 1.6m/s similar to average human walking speed. The system is designed to allow simultaneous control of both the base and the two arms. The system is untethered, with a 1.26kWh battery that weights 14kg at the base. It also serves as a balancing weight to avoid tipping over. All compute during data collection and inference is conductedMobile ALOHA is a low-cost, bimanual mobile manipulation system that supports whole-body teleoperation. It costs $32k, including onboard power and compute. The system allows users to teleoperate to obtain food from the fridge and perform complex long-horizon tasks using imitation learning. The system extends the original ALOHA setup with a mobile base and whole-body teleoperation interface. Using data collected with Mobile ALOHA, supervised behavior cloning is performed, and co-training with existing static ALOHA datasets boosts performance on mobile manipulation tasks. With 50 demonstrations for each task, co-training can increase success rates by up to 90%, allowing Mobile ALOHA to autonomously complete complex tasks such as sautéing and serving shrimp, opening a two-door cabinet to store heavy cooking pots, calling and entering an elevator, and rinsing a used pan using a kitchen faucet. The main contribution of this paper is a system for learning complex mobile bimanual manipulation tasks. Core to this system is both (1) Mobile ALOHA, a low-cost whole-body teleoperation system, and (2) the finding that a simple co-training recipe enables data-efficient learning of complex mobile manipulation tasks. Our teleoperation system is capable of multiple hours of consecutive usage, such as cooking a 3-course meal, cleaning a public bathroom, and doing laundry. Our imitation learning result also holds across a wide range of complex tasks such as opening a two-door wall cabinet to store heavy cooking pots, calling an elevator, pushing in chairs, and cleaning up spilled wine. With co-training, we are able to achieve over 80% success on these tasks with only 50 human demonstrations per task, with an average of 34% absolute improvement compared to no co-training. Mobile ALOHA is a low-cost mobile manipulator that can perform a broad range of household tasks. It inherits the benefits of the original ALOHA system, i.e., the low-cost, dexterous, and repairable bimanual teleoperation setup, while extending its capabilities beyond table-top manipulation. The system can move at a speed comparable to human walking, around 1.42m/s. It is stable when manipulating heavy household objects, such as pots and cabinets. It supports whole-body teleoperation, including both arms and the mobile base. It is untethered, with onboard power and compute. The system uses a Tracer mobile base, which is a low-profile, differential drive mobile base designed for warehouse logistics. It can move up to 1.6m/s similar to average human walking speed. The system is designed to allow simultaneous control of both the base and the two arms. The system is untethered, with a 1.26kWh battery that weights 14kg at the base. It also serves as a balancing weight to avoid tipping over. All compute during data collection and inference is conducted
Reach us at info@study.space
[slides and audio] Mobile ALOHA%3A Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation