[slides] LAION-400M%3A Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

The paper introduces LAION-400M, an open dataset containing 400 million image-text pairs, their CLIP embeddings, and kNN indices. This dataset addresses the lack of publicly available large-scale datasets for training multi-modal language-vision models like CLIP and DALL-E. The dataset was created through a distributed processing of the Common Crawl dataset and post-processing to filter out unsuitable pairs. It includes a web demo for image-text search and a library, img2dataset, for efficient data crawling and processing. The authors demonstrate successful training of a DALL-E model using a subset of the dataset, showcasing the dataset's potential for research and development in multi-modal language-vision models. The release of LAION-400M opens up opportunities for broader community participation in training and researching these models.The paper introduces LAION-400M, an open dataset containing 400 million image-text pairs, their CLIP embeddings, and kNN indices. This dataset addresses the lack of publicly available large-scale datasets for training multi-modal language-vision models like CLIP and DALL-E. The dataset was created through a distributed processing of the Common Crawl dataset and post-processing to filter out unsuitable pairs. It includes a web demo for image-text search and a library, img2dataset, for efficient data crawling and processing. The authors demonstrate successful training of a DALL-E model using a subset of the dataset, showcasing the dataset's potential for research and development in multi-modal language-vision models. The release of LAION-400M opens up opportunities for broader community participation in training and researching these models.

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

3 Nov 2021 | Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran Komatsuzaki