INSTRUCTSCENE is a novel generative framework for 3D indoor scene synthesis driven by natural language instructions. It integrates a semantic graph prior and a layout decoder to enhance controllability and fidelity in scene generation. The semantic graph prior jointly learns scene appearances and layout distributions, enabling versatile downstream tasks without supervision. A high-quality dataset of scene-instruction pairs is curated using large language and multimodal models to facilitate benchmarking. The framework uses diffusion models with discrete semantic graph diffusion and layout decoder to generate scenes that align with instructions. The semantic graph prior is learned through feature quantization and discrete diffusion, while the layout decoder generates precise layouts based on the graph. The method outperforms existing approaches in generation controllability and fidelity, as shown by extensive experiments. INSTRUCTSCENE is capable of zero-shot applications, including stylization, re-arrangement, completion, and unconditional generation. Ablation studies confirm the effectiveness of key design components. The framework is versatile for various downstream tasks and provides a user-friendly interface for 3D scene synthesis. The method is evaluated on three room types and demonstrates superior performance in controllability and fidelity. INSTRUCTSCENE is a promising tool for practical applications such as interior design and immersive metaverse experiences. The work addresses limitations and future directions in the field.INSTRUCTSCENE is a novel generative framework for 3D indoor scene synthesis driven by natural language instructions. It integrates a semantic graph prior and a layout decoder to enhance controllability and fidelity in scene generation. The semantic graph prior jointly learns scene appearances and layout distributions, enabling versatile downstream tasks without supervision. A high-quality dataset of scene-instruction pairs is curated using large language and multimodal models to facilitate benchmarking. The framework uses diffusion models with discrete semantic graph diffusion and layout decoder to generate scenes that align with instructions. The semantic graph prior is learned through feature quantization and discrete diffusion, while the layout decoder generates precise layouts based on the graph. The method outperforms existing approaches in generation controllability and fidelity, as shown by extensive experiments. INSTRUCTSCENE is capable of zero-shot applications, including stylization, re-arrangement, completion, and unconditional generation. Ablation studies confirm the effectiveness of key design components. The framework is versatile for various downstream tasks and provides a user-friendly interface for 3D scene synthesis. The method is evaluated on three room types and demonstrates superior performance in controllability and fidelity. INSTRUCTSCENE is a promising tool for practical applications such as interior design and immersive metaverse experiences. The work addresses limitations and future directions in the field.