21 Feb 2024 | Peter Schaldenbrand, Gaurav Parmar, Jun-Yan Zhu, James McCann, and Jean Oh
CoFRIDA is a collaborative robot painting framework that enables human-robot co-painting by modifying and engaging with content already painted by a human collaborator. Unlike previous systems like FRIDA, which focus on sim-to-real gap reduction and input modalities, CoFRIDA addresses text-image alignment as its major weakness by using pre-trained text-to-image models. However, these models struggle with real-world co-painting due to a lack of understanding of robot constraints and the inability to perform co-painting without unrealistic edits. To overcome these issues, CoFRIDA employs a self-supervised fine-tuning procedure that adapts pre-trained models to generate content within the robot's capabilities and perform co-painting. This approach allows for more accurate and realistic co-painting in the physical world. CoFRIDA is open-source and available on various robot platforms, enabling real-world art creation. The system's self-supervised data creation method uses FRIDA to generate full drawings or paintings from a text-image dataset, then removes strokes to form partial paintings. These are used to fine-tune pre-trained models for co-painting. CoFRIDA's results show improved text-image alignment and reduced sim-to-real gaps compared to FRIDA. The system also supports multiple painting settings and demonstrates the ability to handle different media, such as markers and paintbrushes. CoFRIDA's self-supervised fine-tuning procedure successfully encodes the robot's constraints and abilities into a foundation model, showcasing promising results as an effective method for reducing sim-to-real gaps. The system's performance is evaluated using CLIPScore and BLIPScore, with CoFRIDA outperforming baselines in text-image alignment and semantic sim-to-real gap measurements. CoFRIDA's hierarchical approach enables interactive human-robot co-painting, with semantic planning via pre-trained models in a high-level pixel space before being transferred to a low-level brush stroke planner. This approach reduces the sim-to-real gap and achieves enhanced performance over baselines. CoFRIDA is supported by various funding sources and has been reviewed for ethical considerations and biases.CoFRIDA is a collaborative robot painting framework that enables human-robot co-painting by modifying and engaging with content already painted by a human collaborator. Unlike previous systems like FRIDA, which focus on sim-to-real gap reduction and input modalities, CoFRIDA addresses text-image alignment as its major weakness by using pre-trained text-to-image models. However, these models struggle with real-world co-painting due to a lack of understanding of robot constraints and the inability to perform co-painting without unrealistic edits. To overcome these issues, CoFRIDA employs a self-supervised fine-tuning procedure that adapts pre-trained models to generate content within the robot's capabilities and perform co-painting. This approach allows for more accurate and realistic co-painting in the physical world. CoFRIDA is open-source and available on various robot platforms, enabling real-world art creation. The system's self-supervised data creation method uses FRIDA to generate full drawings or paintings from a text-image dataset, then removes strokes to form partial paintings. These are used to fine-tune pre-trained models for co-painting. CoFRIDA's results show improved text-image alignment and reduced sim-to-real gaps compared to FRIDA. The system also supports multiple painting settings and demonstrates the ability to handle different media, such as markers and paintbrushes. CoFRIDA's self-supervised fine-tuning procedure successfully encodes the robot's constraints and abilities into a foundation model, showcasing promising results as an effective method for reducing sim-to-real gaps. The system's performance is evaluated using CLIPScore and BLIPScore, with CoFRIDA outperforming baselines in text-image alignment and semantic sim-to-real gap measurements. CoFRIDA's hierarchical approach enables interactive human-robot co-painting, with semantic planning via pre-trained models in a high-level pixel space before being transferred to a low-level brush stroke planner. This approach reduces the sim-to-real gap and achieves enhanced performance over baselines. CoFRIDA is supported by various funding sources and has been reviewed for ethical considerations and biases.