BLINK is a new benchmark designed to evaluate the visual perception capabilities of multimodal language models (LLMs). The benchmark consists of 14 classic computer vision tasks that are challenging for current LLMs but can be solved by humans with ease. These tasks range from low-level pattern matching to mid-level reasoning and high-level visual understanding. BLINK reformats these tasks into multiple-choice questions, paired with single or multiple images and visual prompts. Despite the simplicity for humans, the benchmark poses significant challenges for existing multimodal LLMs, with even the best-performing models achieving only around 50% accuracy. The study highlights that specialist computer vision models perform much better on these tasks, suggesting potential pathways for future improvements. BLINK aims to bridge the gap between traditional notions of perception and the modern capabilities of multimodal LLMs, providing a testbed for researchers to explore and enhance visual perception in LLMs.BLINK is a new benchmark designed to evaluate the visual perception capabilities of multimodal language models (LLMs). The benchmark consists of 14 classic computer vision tasks that are challenging for current LLMs but can be solved by humans with ease. These tasks range from low-level pattern matching to mid-level reasoning and high-level visual understanding. BLINK reformats these tasks into multiple-choice questions, paired with single or multiple images and visual prompts. Despite the simplicity for humans, the benchmark poses significant challenges for existing multimodal LLMs, with even the best-performing models achieving only around 50% accuracy. The study highlights that specialist computer vision models perform much better on these tasks, suggesting potential pathways for future improvements. BLINK aims to bridge the gap between traditional notions of perception and the modern capabilities of multimodal LLMs, providing a testbed for researchers to explore and enhance visual perception in LLMs.