MuDI is a novel framework for multi-subject personalization of text-to-image models that effectively decouples identities among multiple subjects. The framework leverages segmented subjects generated by a foundation model for segmentation (Segment Anything Model) for both training and inference, serving as data augmentation for training and initialization for the generation process. The key idea is to use segmented subjects to train the model, enabling it to distinguish between different identities. Additionally, a new metric is introduced to evaluate the performance of the method on multi-subject personalization. Experimental results show that MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects. In human evaluations, MuDI outperforms existing baselines, achieving twice the success rate in personalizing multiple subjects without identity mixing and being preferred by 70% of raters over the strongest baseline. MuDI is evaluated on a new dataset of subjects prone to identity mixing, including diverse categories from animals to objects and scenes. The framework is shown to significantly outperform existing methods in both qualitative and quantitative comparisons. MuDI is also shown to be effective in handling multiple subjects with similar appearances, avoiding identity mixing and artifacts observed in prior methods. The framework is model-agnostic, as it relies on data augmentation during training without requiring model-specific techniques. MuDI is also shown to be effective in controlling the relative size between personalized subjects and in modular customization, where subjects are independently learned and combined to generate multi-subject images. The framework is further validated through iterative training, which improves the quality of generated images. Overall, MuDI provides a novel approach to multi-subject personalization that effectively decouples identities among multiple subjects.MuDI is a novel framework for multi-subject personalization of text-to-image models that effectively decouples identities among multiple subjects. The framework leverages segmented subjects generated by a foundation model for segmentation (Segment Anything Model) for both training and inference, serving as data augmentation for training and initialization for the generation process. The key idea is to use segmented subjects to train the model, enabling it to distinguish between different identities. Additionally, a new metric is introduced to evaluate the performance of the method on multi-subject personalization. Experimental results show that MuDI can produce high-quality personalized images without identity mixing, even for highly similar subjects. In human evaluations, MuDI outperforms existing baselines, achieving twice the success rate in personalizing multiple subjects without identity mixing and being preferred by 70% of raters over the strongest baseline. MuDI is evaluated on a new dataset of subjects prone to identity mixing, including diverse categories from animals to objects and scenes. The framework is shown to significantly outperform existing methods in both qualitative and quantitative comparisons. MuDI is also shown to be effective in handling multiple subjects with similar appearances, avoiding identity mixing and artifacts observed in prior methods. The framework is model-agnostic, as it relies on data augmentation during training without requiring model-specific techniques. MuDI is also shown to be effective in controlling the relative size between personalized subjects and in modular customization, where subjects are independently learned and combined to generate multi-subject images. The framework is further validated through iterative training, which improves the quality of generated images. Overall, MuDI provides a novel approach to multi-subject personalization that effectively decouples identities among multiple subjects.