[slides and audio] MyVLM%3A Personalizing VLMs for User-Specific Queries

The paper introduces MyVLM, a method to personalize vision-language models (VLMs) for user-specific concepts. MyVLM enables VLMs to understand and reason over specific concepts, such as unique objects or individuals, by augmenting the model with external concept heads. These heads help identify user-specific concepts within images and guide the language model to incorporate these concepts into its responses. The technique is applied to personalized image captioning and visual question-answering tasks, demonstrating improved performance compared to existing methods. The paper also discusses the limitations of MyVLM, including the reliance on VLM biases and the quality of concept heads. Overall, MyVLM opens new possibilities for more meaningful human-computer interactions.The paper introduces MyVLM, a method to personalize vision-language models (VLMs) for user-specific concepts. MyVLM enables VLMs to understand and reason over specific concepts, such as unique objects or individuals, by augmenting the model with external concept heads. These heads help identify user-specific concepts within images and guide the language model to incorporate these concepts into its responses. The technique is applied to personalized image captioning and visual question-answering tasks, demonstrating improved performance compared to existing methods. The paper also discusses the limitations of MyVLM, including the reliance on VLM biases and the quality of concept heads. Overall, MyVLM opens new possibilities for more meaningful human-computer interactions.

MyVLM: Personalizing VLMs for User-Specific Queries

21 Mar 2024 | Yuval Alaluf, Elad Richardson, Sergey Tulyakov, Kfir Aberman, Daniel Cohen-Or