INDICGENBENCH is a multilingual benchmark for evaluating large language models (LLMs) on Indic languages, covering 29 languages across 13 scripts and 4 language families. It includes tasks such as cross-lingual summarization, machine translation, and question-answering. The benchmark is composed of human-curated data, providing evaluation sets for many under-represented Indic languages. It evaluates a range of LLMs, including GPT-3.5, GPT-4, PaLM-2, mT5, Gemma, BLOOM, and LLaMA, showing that the largest PaLM-2 models perform best on most tasks, but there is a significant performance gap between Indic and English languages. INDICGENBENCH is available at www.github.com/google-research-datasets/indic-gen-bench. The benchmark includes five tasks: cross-lingual summarization (CROSSSUM-IN), machine translation (FLORES-IN), multilingual QA (XQUAD-IN), and cross-lingual QA (XORQA-IN-XX and XORQA-IN-EN). The benchmark provides evaluation data for 29 Indic languages, with varying levels of resourcefulness. It also includes training data for efficient adaptation of LLMs. The benchmark shows that performance varies significantly across languages based on resourcefulness, with higher resource languages performing better. The benchmark also highlights the need for further research to improve multilingual language models. The benchmark includes datasets for cross-lingual summarization, machine translation, multilingual QA, and cross-lingual QA, with examples of instances across tasks. The benchmark is designed to evaluate the generation capabilities of LLMs across a diverse set of Indic languages. The benchmark includes a variety of tasks and datasets, allowing for comprehensive evaluation of LLMs. The benchmark is available for use and provides a valuable resource for researchers in the field of natural language processing.INDICGENBENCH is a multilingual benchmark for evaluating large language models (LLMs) on Indic languages, covering 29 languages across 13 scripts and 4 language families. It includes tasks such as cross-lingual summarization, machine translation, and question-answering. The benchmark is composed of human-curated data, providing evaluation sets for many under-represented Indic languages. It evaluates a range of LLMs, including GPT-3.5, GPT-4, PaLM-2, mT5, Gemma, BLOOM, and LLaMA, showing that the largest PaLM-2 models perform best on most tasks, but there is a significant performance gap between Indic and English languages. INDICGENBENCH is available at www.github.com/google-research-datasets/indic-gen-bench. The benchmark includes five tasks: cross-lingual summarization (CROSSSUM-IN), machine translation (FLORES-IN), multilingual QA (XQUAD-IN), and cross-lingual QA (XORQA-IN-XX and XORQA-IN-EN). The benchmark provides evaluation data for 29 Indic languages, with varying levels of resourcefulness. It also includes training data for efficient adaptation of LLMs. The benchmark shows that performance varies significantly across languages based on resourcefulness, with higher resource languages performing better. The benchmark also highlights the need for further research to improve multilingual language models. The benchmark includes datasets for cross-lingual summarization, machine translation, multilingual QA, and cross-lingual QA, with examples of instances across tasks. The benchmark is designed to evaluate the generation capabilities of LLMs across a diverse set of Indic languages. The benchmark includes a variety of tasks and datasets, allowing for comprehensive evaluation of LLMs. The benchmark is available for use and provides a valuable resource for researchers in the field of natural language processing.