MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing

MIKE: A New Benchmark for Fine-grained Multimodal Entity Knowledge Editing

18 Feb 2024 | Jiaqi Li, Miaozeng Du, Chuanyi Zhang, Yongrui Chen, Nan Hu, Guilin Qi, Haiyun Jiang, Siyuan Cheng, Bozhong Tian
MIKE is a new benchmark for fine-grained multimodal entity knowledge editing. It addresses the gap in current benchmarks that focus on coarse-grained knowledge, by introducing a comprehensive dataset and tasks tailored for fine-grained (FG) entity knowledge editing. MIKE includes tasks such as Vanilla Name Answering, Entity-Level Caption, and Complex-Scenario Recognition, along with a new form of knowledge editing called Multi-Step Editing. The benchmark evaluates the effectiveness of current methods in editing FG knowledge, highlighting their limitations in accurately recognizing and editing FG entities. The dataset contains over 1,100 FG entities, each with at least five images, and is designed to challenge MLLMs in various aspects of knowledge editing. The benchmark also introduces entity-oriented metrics to assess the reliability, generality, and locality of edited knowledge. Results show that current methods struggle with tasks like Entity-Level Caption, indicating the complexity of FG knowledge editing. MIKE provides a clear agenda for future research in this area, emphasizing the need for novel approaches to improve the accuracy and efficiency of MLLM knowledge editing.MIKE is a new benchmark for fine-grained multimodal entity knowledge editing. It addresses the gap in current benchmarks that focus on coarse-grained knowledge, by introducing a comprehensive dataset and tasks tailored for fine-grained (FG) entity knowledge editing. MIKE includes tasks such as Vanilla Name Answering, Entity-Level Caption, and Complex-Scenario Recognition, along with a new form of knowledge editing called Multi-Step Editing. The benchmark evaluates the effectiveness of current methods in editing FG knowledge, highlighting their limitations in accurately recognizing and editing FG entities. The dataset contains over 1,100 FG entities, each with at least five images, and is designed to challenge MLLMs in various aspects of knowledge editing. The benchmark also introduces entity-oriented metrics to assess the reliability, generality, and locality of edited knowledge. Results show that current methods struggle with tasks like Entity-Level Caption, indicating the complexity of FG knowledge editing. MIKE provides a clear agenda for future research in this area, emphasizing the need for novel approaches to improve the accuracy and efficiency of MLLM knowledge editing.
Reach us at info@study.space