The paper introduces MIKE, a comprehensive benchmark and dataset designed for fine-grained (FG) multimodal entity knowledge editing in Multimodal Large Language Models (MLLMs). Unlike existing benchmarks that focus on coarse-grained knowledge, MIKE specifically addresses the challenges of FG entity recognition and editing, which are crucial for practical applications. The benchmark includes three main tasks: Vanilla Name Answering (VNA), Entity-Level Caption (ELC), and Complex-Scenario Recognition (CSR), each tailored to assess different aspects of MLLMs' performance. Additionally, a new form of knowledge editing called Multi-Step Editing is introduced to evaluate the efficiency of editing methods. Extensive experiments using two MLLMs, BLIP-2 and MiniGPT-4, demonstrate that current state-of-the-art methods face significant challenges in handling the FG knowledge editing tasks, highlighting the need for novel approaches in this domain. The findings suggest that the Entity-Level Caption task is the most challenging, and that different generality tasks affect the ability of MIKE in various ways. The paper also discusses the impact of model size and image augmentations on performance, concluding with a call for future research to address these limitations and extend the benchmark.The paper introduces MIKE, a comprehensive benchmark and dataset designed for fine-grained (FG) multimodal entity knowledge editing in Multimodal Large Language Models (MLLMs). Unlike existing benchmarks that focus on coarse-grained knowledge, MIKE specifically addresses the challenges of FG entity recognition and editing, which are crucial for practical applications. The benchmark includes three main tasks: Vanilla Name Answering (VNA), Entity-Level Caption (ELC), and Complex-Scenario Recognition (CSR), each tailored to assess different aspects of MLLMs' performance. Additionally, a new form of knowledge editing called Multi-Step Editing is introduced to evaluate the efficiency of editing methods. Extensive experiments using two MLLMs, BLIP-2 and MiniGPT-4, demonstrate that current state-of-the-art methods face significant challenges in handling the FG knowledge editing tasks, highlighting the need for novel approaches in this domain. The findings suggest that the Entity-Level Caption task is the most challenging, and that different generality tasks affect the ability of MIKE in various ways. The paper also discusses the impact of model size and image augmentations on performance, concluding with a call for future research to address these limitations and extend the benchmark.