31 Jan 2024 | Fengyu Yang, Chao Feng, Ziyang Chen, Hyoungseob Park, Daniel Wang, Yiming Dou, Ziyao Zeng, Xien Chen, Rit Gangopadhyay, Andrew Owens, Alex Wong
The paper introduces UniTouch, a unified multimodal tactile representation model for vision-based touch sensors. UniTouch aligns touch signals with pre-trained image embeddings from large-scale vision-language datasets, enabling zero-shot tactile sensing tasks across various domains. The model uses sensor-specific tokens to handle the variability in different tactile sensors and a batch sampling strategy to optimize training. UniTouch demonstrates strong performance in zero-shot touch understanding, cross-modal retrieval, image synthesis with touch, and X-to-touch generation. It outperforms existing methods on multiple datasets, showcasing its effectiveness in bridging touch with other modalities. The work opens new avenues for multimodal touch experience and integrates tactile sensing into multimodal foundation models.The paper introduces UniTouch, a unified multimodal tactile representation model for vision-based touch sensors. UniTouch aligns touch signals with pre-trained image embeddings from large-scale vision-language datasets, enabling zero-shot tactile sensing tasks across various domains. The model uses sensor-specific tokens to handle the variability in different tactile sensors and a batch sampling strategy to optimize training. UniTouch demonstrates strong performance in zero-shot touch understanding, cross-modal retrieval, image synthesis with touch, and X-to-touch generation. It outperforms existing methods on multiple datasets, showcasing its effectiveness in bridging touch with other modalities. The work opens new avenues for multimodal touch experience and integrates tactile sensing into multimodal foundation models.