INDICVOICES is a large-scale multilingual speech dataset containing 7348 hours of audio data from 16237 speakers across 145 Indian districts and 22 languages. The dataset includes read (9%), extempore (74%), and conversational (17%) speech, with 1639 hours already transcribed. The project aims to create an inclusive and representative dataset that captures the linguistic, cultural, and demographic diversity of India. The dataset includes a wide range of content, such as read speech, voice commands, extempore discussions, and both wide and narrow-band recordings. It also includes data recorded in noisy environments to reflect real-world usage scenarios. The dataset is accompanied by an open-source blueprint for data collection, including standardized protocols, a repository of engaging questions and prompts, quality control mechanisms, and transcription guidelines. The dataset is used to build IndicASR, the first ASR model to support all 22 languages listed in the 8th schedule of the Indian Constitution. The data, tools, guidelines, and models developed as part of this work will be made publicly available. The project also includes a comprehensive set of 2.5K questions, 46.6K prompts, and 1.1K to 4.1K role-play scenarios across 21 domains and 28 topics of interest. The data collection process involved a countrywide network of agencies, local universities, NGOs, and social sector professionals to ensure diverse representation across demographics, locations, and domains. The data collection process included four stages: preparation, on-field data collection, quality control, and transcription. The quality control process involved verifying the meta-data, ensuring diversity criteria were met, and checking for errors in the audio files. The transcription process involved audio segmentation, transcription guidelines, and a maker-checker-superchecker workflow to ensure quality. The dataset is designed to serve as a comprehensive starter kit for data collection efforts in other multilingual regions of the world.INDICVOICES is a large-scale multilingual speech dataset containing 7348 hours of audio data from 16237 speakers across 145 Indian districts and 22 languages. The dataset includes read (9%), extempore (74%), and conversational (17%) speech, with 1639 hours already transcribed. The project aims to create an inclusive and representative dataset that captures the linguistic, cultural, and demographic diversity of India. The dataset includes a wide range of content, such as read speech, voice commands, extempore discussions, and both wide and narrow-band recordings. It also includes data recorded in noisy environments to reflect real-world usage scenarios. The dataset is accompanied by an open-source blueprint for data collection, including standardized protocols, a repository of engaging questions and prompts, quality control mechanisms, and transcription guidelines. The dataset is used to build IndicASR, the first ASR model to support all 22 languages listed in the 8th schedule of the Indian Constitution. The data, tools, guidelines, and models developed as part of this work will be made publicly available. The project also includes a comprehensive set of 2.5K questions, 46.6K prompts, and 1.1K to 4.1K role-play scenarios across 21 domains and 28 topics of interest. The data collection process involved a countrywide network of agencies, local universities, NGOs, and social sector professionals to ensure diverse representation across demographics, locations, and domains. The data collection process included four stages: preparation, on-field data collection, quality control, and transcription. The quality control process involved verifying the meta-data, ensuring diversity criteria were met, and checking for errors in the audio files. The transcription process involved audio segmentation, transcription guidelines, and a maker-checker-superchecker workflow to ensure quality. The dataset is designed to serve as a comprehensive starter kit for data collection efforts in other multilingual regions of the world.