The paper introduces *Emilia*, a large-scale, multilingual speech generation dataset derived from in-the-wild speech data, and *Emilia-Pipe*, an open-source preprocessing pipeline designed to transform raw speech data into high-quality training materials. Emilia comprises over 101k hours of speech in six languages (English, Chinese, German, French, Japanese, and Korean), featuring diverse speaking styles. Emilia-Pipe includes six steps: standardization, source separation, speaker diarization, fine-grained segmentation, automated speech recognition (ASR), and filtering. It can process 2.50 hours of raw speech data in one minute, making it efficient for large-scale data scaling. Experimental results show that models trained on Emilia achieve high-quality, spontaneous, and human-like speech generation, outperforming existing datasets. The paper also evaluates the effectiveness of Emilia in text-to-speech (TTS) applications, demonstrating its potential for multilingual TTS.The paper introduces *Emilia*, a large-scale, multilingual speech generation dataset derived from in-the-wild speech data, and *Emilia-Pipe*, an open-source preprocessing pipeline designed to transform raw speech data into high-quality training materials. Emilia comprises over 101k hours of speech in six languages (English, Chinese, German, French, Japanese, and Korean), featuring diverse speaking styles. Emilia-Pipe includes six steps: standardization, source separation, speaker diarization, fine-grained segmentation, automated speech recognition (ASR), and filtering. It can process 2.50 hours of raw speech data in one minute, making it efficient for large-scale data scaling. Experimental results show that models trained on Emilia achieve high-quality, spontaneous, and human-like speech generation, outperforming existing datasets. The paper also evaluates the effectiveness of Emilia in text-to-speech (TTS) applications, demonstrating its potential for multilingual TTS.