The paper introduces a novel framework called Subpopulation Structure Discovery with Large Language Models (SSD-LLM) to automatically uncover and analyze subpopulation structures within datasets. Subpopulation structures are hierarchical relations among subpopulations determined by specific criteria, which are crucial for understanding and addressing various downstream tasks such as dataset subpopulation organization, subpopulation shift, and slice discovery. SSD-LLM leverages the capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) to generate informative image captions and summarize subpopulation structures. The framework includes two key components: Criteria Initialization and Criteria Refinement, which use a sample-based approach to identify dimensions and attributes. The method is evaluated on multiple datasets, demonstrating significant improvements in accuracy and efficiency compared to existing methods. The paper also discusses the limitations and future directions, highlighting the potential of SSD-LLM in various computer vision and multimodal tasks.The paper introduces a novel framework called Subpopulation Structure Discovery with Large Language Models (SSD-LLM) to automatically uncover and analyze subpopulation structures within datasets. Subpopulation structures are hierarchical relations among subpopulations determined by specific criteria, which are crucial for understanding and addressing various downstream tasks such as dataset subpopulation organization, subpopulation shift, and slice discovery. SSD-LLM leverages the capabilities of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) to generate informative image captions and summarize subpopulation structures. The framework includes two key components: Criteria Initialization and Criteria Refinement, which use a sample-based approach to identify dimensions and attributes. The method is evaluated on multiple datasets, demonstrating significant improvements in accuracy and efficiency compared to existing methods. The paper also discusses the limitations and future directions, highlighting the potential of SSD-LLM in various computer vision and multimodal tasks.