EmoVIT is a novel approach that leverages visual instruction tuning to enhance emotion recognition in images. The method introduces a GPT-assisted pipeline for generating emotion-specific instruction data, addressing the scarcity of annotated data in this domain. Building on InstructBLIP, EmoVIT incorporates emotion-centric instruction data, utilizing large language models (LLMs) to improve performance. The model demonstrates proficiency in emotion classification, affective reasoning, and humor comprehension. Through extensive experiments, EmoVIT outperforms existing methods in emotion recognition tasks, showing robustness and adaptability across different datasets. The framework is designed to be efficient, requiring less training data than traditional methods while achieving superior results. The study also highlights the importance of diverse instruction data in enhancing model performance and provides insights into the effectiveness of visual instruction tuning in emotion understanding. The results indicate that EmoVIT is a promising approach for future research in emotion recognition and visual instruction tuning.EmoVIT is a novel approach that leverages visual instruction tuning to enhance emotion recognition in images. The method introduces a GPT-assisted pipeline for generating emotion-specific instruction data, addressing the scarcity of annotated data in this domain. Building on InstructBLIP, EmoVIT incorporates emotion-centric instruction data, utilizing large language models (LLMs) to improve performance. The model demonstrates proficiency in emotion classification, affective reasoning, and humor comprehension. Through extensive experiments, EmoVIT outperforms existing methods in emotion recognition tasks, showing robustness and adaptability across different datasets. The framework is designed to be efficient, requiring less training data than traditional methods while achieving superior results. The study also highlights the importance of diverse instruction data in enhancing model performance and provides insights into the effectiveness of visual instruction tuning in emotion understanding. The results indicate that EmoVIT is a promising approach for future research in emotion recognition and visual instruction tuning.