The paper introduces a framework called Robotics with Fast and Slow Thinking (RFST) for language-conditioned robotic manipulation. Inspired by the dual-process theory in cognitive science, RFST aims to classify tasks into two systems: fast-thinking and slow-thinking. The fast-thinking system is a simple policy network, while the slow-thinking system involves a fine-tuned vision-language model (VLM) that can perform reasoning and intent recognition. The framework uses an instruction discriminator to determine which system should be activated based on the user's instructions. The VLM is trained on a dataset featuring real-world trajectories, including tasks ranging from spontaneous impulses to deliberate contemplation. The effectiveness of RFST is evaluated in both simulation and real-world scenarios, demonstrating superior performance on complex tasks requiring intent recognition and reasoning. The project is available at https://jlm-z.github.io/RFST/.The paper introduces a framework called Robotics with Fast and Slow Thinking (RFST) for language-conditioned robotic manipulation. Inspired by the dual-process theory in cognitive science, RFST aims to classify tasks into two systems: fast-thinking and slow-thinking. The fast-thinking system is a simple policy network, while the slow-thinking system involves a fine-tuned vision-language model (VLM) that can perform reasoning and intent recognition. The framework uses an instruction discriminator to determine which system should be activated based on the user's instructions. The VLM is trained on a dataset featuring real-world trajectories, including tasks ranging from spontaneous impulses to deliberate contemplation. The effectiveness of RFST is evaluated in both simulation and real-world scenarios, demonstrating superior performance on complex tasks requiring intent recognition and reasoning. The project is available at https://jlm-z.github.io/RFST/.