The paper "Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering" addresses the challenge of identifying relevant clusterings from a set of generated clusterings, which is particularly difficult when users have specific interests or keywords. The authors propose a novel method called Multi-MaP, which leverages multi-modal proxy learning to align user interests with visual data. Multi-MaP uses CLIP encoders to extract text and image embeddings, and GPT-4 to integrate user interests into the clustering process. The method incorporates reference word constraints and concept-level constraints to learn optimal text proxies according to the user's interests. Extensive experiments on various benchmark datasets show that Multi-MaP outperforms state-of-the-art methods in multiple clustering tasks, demonstrating its effectiveness in capturing user interests and generating personalized clusterings. The code for Multi-MaP is available at <https://github.com/Alexander-Yao/Multi-MaP>.The paper "Multi-Modal Proxy Learning Towards Personalized Visual Multiple Clustering" addresses the challenge of identifying relevant clusterings from a set of generated clusterings, which is particularly difficult when users have specific interests or keywords. The authors propose a novel method called Multi-MaP, which leverages multi-modal proxy learning to align user interests with visual data. Multi-MaP uses CLIP encoders to extract text and image embeddings, and GPT-4 to integrate user interests into the clustering process. The method incorporates reference word constraints and concept-level constraints to learn optimal text proxies according to the user's interests. Extensive experiments on various benchmark datasets show that Multi-MaP outperforms state-of-the-art methods in multiple clustering tasks, demonstrating its effectiveness in capturing user interests and generating personalized clusterings. The code for Multi-MaP is available at <https://github.com/Alexander-Yao/Multi-MaP>.