This paper addresses the challenge of open-ended visual quality comparison, aiming to develop a model that can respond to open-range questions and provide detailed reasoning on the quality of multiple images. The authors propose Co-Instruct, a novel large multi-modality model (LMM) designed for this purpose. To train Co-Instruct, they collect the Co-Instruct-562K dataset, which combines two sources: (1) LLM-merged single image quality descriptions, and (2) GPT-4V "teacher" responses on unlabeled data. This dataset is the first of its kind for open-ended visual quality comparison. The authors also introduce MICBench, a benchmark specifically designed for evaluating LMMs on multi-image quality comparison tasks. Co-Instruct outperforms existing LMMs on both the proposed MICBench and existing quality evaluation benchmarks, achieving up to 30% higher accuracy than state-of-the-art models and surpassing GPT-4V in various multi-choice question (MCQ) benchmarks. The paper contributes to the field by advancing the capabilities of LMMs in open-ended visual quality comparison and providing a comprehensive benchmark for future research.This paper addresses the challenge of open-ended visual quality comparison, aiming to develop a model that can respond to open-range questions and provide detailed reasoning on the quality of multiple images. The authors propose Co-Instruct, a novel large multi-modality model (LMM) designed for this purpose. To train Co-Instruct, they collect the Co-Instruct-562K dataset, which combines two sources: (1) LLM-merged single image quality descriptions, and (2) GPT-4V "teacher" responses on unlabeled data. This dataset is the first of its kind for open-ended visual quality comparison. The authors also introduce MICBench, a benchmark specifically designed for evaluating LMMs on multi-image quality comparison tasks. Co-Instruct outperforms existing LMMs on both the proposed MICBench and existing quality evaluation benchmarks, achieving up to 30% higher accuracy than state-of-the-art models and surpassing GPT-4V in various multi-choice question (MCQ) benchmarks. The paper contributes to the field by advancing the capabilities of LMMs in open-ended visual quality comparison and providing a comprehensive benchmark for future research.