Understanding Octopi%3A Object Property Reasoning with Large Tactile-Language Models

This paper introduces a new tactile-language model, OCTOPI, and a corresponding dataset, PHYSICLEAR, for physical reasoning tasks involving tactile inputs. The work aims to enhance physical reasoning in robots by integrating tactile perception with language-based reasoning. PHYSICLEAR is a dataset containing tactile videos of everyday objects, annotated with physical properties such as hardness, roughness, and bumpiness. OCTOPI is a system that combines tactile representation learning with large vision-language models to predict and reason about tactile inputs with minimal language fine-tuning. The system is evaluated on PHYSICLEAR, demonstrating its ability to effectively use intermediate physical property predictions to improve performance on various tactile-related tasks. The paper also discusses the importance of tactile perception in physical reasoning, particularly in scenarios where visual information is ambiguous. The results show that OCTOPI can perform physical reasoning tasks such as object property description, property comparison, superlative selection, property-object matching, and scenario reasoning. The system is tested on real-world tasks, including avocado ripeness classification, where it leverages commonsense knowledge to make accurate predictions. The study highlights the potential of tactile-language models in enabling embodied AI systems to perform physical reasoning tasks. The paper also discusses the impact of different components in OCTOPI, showing that fine-tuning the visual encoder and using parameter-efficient LLM fine-tuning improves performance. The results indicate that OCTOPI outperforms baselines in physical property prediction and reasoning tasks. The work contributes to the field of tactile robotics and opens up new research directions in tactile perception and physical reasoning.This paper introduces a new tactile-language model, OCTOPI, and a corresponding dataset, PHYSICLEAR, for physical reasoning tasks involving tactile inputs. The work aims to enhance physical reasoning in robots by integrating tactile perception with language-based reasoning. PHYSICLEAR is a dataset containing tactile videos of everyday objects, annotated with physical properties such as hardness, roughness, and bumpiness. OCTOPI is a system that combines tactile representation learning with large vision-language models to predict and reason about tactile inputs with minimal language fine-tuning. The system is evaluated on PHYSICLEAR, demonstrating its ability to effectively use intermediate physical property predictions to improve performance on various tactile-related tasks. The paper also discusses the importance of tactile perception in physical reasoning, particularly in scenarios where visual information is ambiguous. The results show that OCTOPI can perform physical reasoning tasks such as object property description, property comparison, superlative selection, property-object matching, and scenario reasoning. The system is tested on real-world tasks, including avocado ripeness classification, where it leverages commonsense knowledge to make accurate predictions. The study highlights the potential of tactile-language models in enabling embodied AI systems to perform physical reasoning tasks. The paper also discusses the impact of different components in OCTOPI, showing that fine-tuning the visual encoder and using parameter-efficient LLM fine-tuning improves performance. The results indicate that OCTOPI outperforms baselines in physical property prediction and reasoning tasks. The work contributes to the field of tactile robotics and opens up new research directions in tactile perception and physical reasoning.

Octopi: Object Property Reasoning with Large Tactile-Language Models

5 Jun 2024 | Samson Yu, Kelvin Lin, Anxing Xiao, Jiafei Duan, and Harold Soh