[slides] Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

The paper introduces Meta-Prompting for Visual Recognition (MPVR), a method to automate the generation of category-specific prompts for zero-shot visual recognition using Large Language Models (LLMs). MPVR aims to reduce the manual effort required to generate these prompts, which is crucial for enhancing the zero-shot classification performance of Vision-Language Models (VLMs). The method consists of two stages: first, an LLM is meta-prompted with a system prompt, in-context examples, and downstream task metadata to generate diverse task-specific LLM query templates. In the second stage, these templates are populated with specific class labels to produce category-specific VLM prompts. The resulting prompts are used to ensemble a zero-shot classifier, which shows significant improvements over existing methods on various benchmarks. The paper also opens-source a dataset of 2.5 million unique class descriptions generated by MPVR, demonstrating its effectiveness and broad applicability.The paper introduces Meta-Prompting for Visual Recognition (MPVR), a method to automate the generation of category-specific prompts for zero-shot visual recognition using Large Language Models (LLMs). MPVR aims to reduce the manual effort required to generate these prompts, which is crucial for enhancing the zero-shot classification performance of Vision-Language Models (VLMs). The method consists of two stages: first, an LLM is meta-prompted with a system prompt, in-context examples, and downstream task metadata to generate diverse task-specific LLM query templates. In the second stage, these templates are populated with specific class labels to produce category-specific VLM prompts. The resulting prompts are used to ensemble a zero-shot classifier, which shows significant improvements over existing methods on various benchmarks. The paper also opens-source a dataset of 2.5 million unique class descriptions generated by MPVR, demonstrating its effectiveness and broad applicability.

Meta-Prompting for Automating Zero-shot Visual Recognition with LLMs

7 Aug 2024 | M. Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Sivan Doveh, Jakub Micorek, Mateusz Kozinski, Hilde Kuehne, Horst Possegger