PCA-Bench is a multimodal benchmark designed to evaluate the integrated capabilities of Multimodal Large Language Models (MLLMs) in perception-cognition-action chains. Unlike previous benchmarks that focus on simple tasks, PCA-Bench introduces three complex scenarios: autonomous driving, domestic robotics, and open-world games. The benchmark requires models to seamlessly integrate perception, cognition, and action in a reasoning chain to make accurate decisions. It also features error localization capabilities, allowing for detailed analysis of model inaccuracies in perception, knowledge, or reasoning. To balance accuracy and efficiency, PCA-Bench introduces PCA-Eval, an automatic evaluation protocol that assesses 10 prevalent MLLMs. Results show significant performance differences between open-source and proprietary models like GPT-4 Vision. To address this, the paper introduces Embodied-Instruction-Evolution (EIE), an automatic framework for synthesizing instruction tuning examples in multimodal environments. EIE generates 7,510 training examples in PCA-Bench and enhances the performance of open-source MLLMs, occasionally surpassing GPT-4 Vision. The findings suggest that robust MLLMs like GPT4-Vision show promise for decision-making in embodied agents, opening new avenues for MLLM research. PCA-Bench includes three domains: autonomous driving, domestic robotics, and open-world games. Each instance is annotated with a 6-element tuple: <image, question, action candidates, answer, reason, key concept>. PCA-Eval is an anchor-based evaluation protocol that automatically conducts error localization using LLMs and data annotation. It demonstrates strong alignment with human assessments, reaching high kappa coefficients for perception, cognition, and action scores. The paper also explores the impact of EIE on model performance, showing significant improvements in various domains. The results highlight the importance of error localization in evaluating MLLMs and the potential of EIE in enhancing model performance. The paper concludes that PCA-Bench provides a comprehensive benchmark for evaluating MLLMs in embodied decision-making scenarios.PCA-Bench is a multimodal benchmark designed to evaluate the integrated capabilities of Multimodal Large Language Models (MLLMs) in perception-cognition-action chains. Unlike previous benchmarks that focus on simple tasks, PCA-Bench introduces three complex scenarios: autonomous driving, domestic robotics, and open-world games. The benchmark requires models to seamlessly integrate perception, cognition, and action in a reasoning chain to make accurate decisions. It also features error localization capabilities, allowing for detailed analysis of model inaccuracies in perception, knowledge, or reasoning. To balance accuracy and efficiency, PCA-Bench introduces PCA-Eval, an automatic evaluation protocol that assesses 10 prevalent MLLMs. Results show significant performance differences between open-source and proprietary models like GPT-4 Vision. To address this, the paper introduces Embodied-Instruction-Evolution (EIE), an automatic framework for synthesizing instruction tuning examples in multimodal environments. EIE generates 7,510 training examples in PCA-Bench and enhances the performance of open-source MLLMs, occasionally surpassing GPT-4 Vision. The findings suggest that robust MLLMs like GPT4-Vision show promise for decision-making in embodied agents, opening new avenues for MLLM research. PCA-Bench includes three domains: autonomous driving, domestic robotics, and open-world games. Each instance is annotated with a 6-element tuple: <image, question, action candidates, answer, reason, key concept>. PCA-Eval is an anchor-based evaluation protocol that automatically conducts error localization using LLMs and data annotation. It demonstrates strong alignment with human assessments, reaching high kappa coefficients for perception, cognition, and action scores. The paper also explores the impact of EIE on model performance, showing significant improvements in various domains. The results highlight the importance of error localization in evaluating MLLMs and the potential of EIE in enhancing model performance. The paper concludes that PCA-Bench provides a comprehensive benchmark for evaluating MLLMs in embodied decision-making scenarios.