This paper introduces the conditional version of Generative Adversarial Nets (GANs), which can generate data conditioned on additional information such as class labels or other modalities. The authors demonstrate that this model can generate MNIST digits conditioned on class labels and illustrate its potential for multi-modal learning, particularly in image tagging. They show that the model can generate descriptive tags that are not part of the training labels. The paper includes experimental results on both unimodal and multimodal datasets, highlighting the model's ability to produce realistic samples and generate meaningful tags. The authors also discuss future directions, including more sophisticated models and joint training schemes for language models.This paper introduces the conditional version of Generative Adversarial Nets (GANs), which can generate data conditioned on additional information such as class labels or other modalities. The authors demonstrate that this model can generate MNIST digits conditioned on class labels and illustrate its potential for multi-modal learning, particularly in image tagging. They show that the model can generate descriptive tags that are not part of the training labels. The paper includes experimental results on both unimodal and multimodal datasets, highlighting the model's ability to produce realistic samples and generate meaningful tags. The authors also discuss future directions, including more sophisticated models and joint training schemes for language models.