The paper introduces TRUMANS, a large-scale motion-captured dataset for Human-Scene Interaction (HSI) modeling, and a novel method for generating HSIs with arbitrary length. TRUMANS contains over 15 hours of human interactions across 100 indoor scenes, capturing whole-body human motions and part-level object dynamics with high fidelity. It is enhanced by digitally replicating physical environments into accurate virtual models and applying extensive augmentations to both humans and objects while maintaining interaction fidelity. The dataset is used to develop a diffusion-based autoregressive model that efficiently generates HSI sequences of any length, taking into account both scene context and intended actions. The model is evaluated on various 3D scene datasets, showing remarkable zero-shot generalizability and producing motions that closely mimic original motion-captured sequences. The method is also benchmarked for human pose and contact estimation tasks, demonstrating its versatility and establishing it as a valuable asset for future research. The paper also presents a detailed analysis of the dataset and method, including experimental results and comparisons with existing approaches. The method is capable of real-time generation of HSI sequences of arbitrary length and has been shown to outperform existing baselines in terms of quality and zero-shot generalizability. The paper concludes that TRUMANS is a high-quality resource for HSI research, addressing scalability, data quality, and advanced motion synthesis challenges in HSI modeling.The paper introduces TRUMANS, a large-scale motion-captured dataset for Human-Scene Interaction (HSI) modeling, and a novel method for generating HSIs with arbitrary length. TRUMANS contains over 15 hours of human interactions across 100 indoor scenes, capturing whole-body human motions and part-level object dynamics with high fidelity. It is enhanced by digitally replicating physical environments into accurate virtual models and applying extensive augmentations to both humans and objects while maintaining interaction fidelity. The dataset is used to develop a diffusion-based autoregressive model that efficiently generates HSI sequences of any length, taking into account both scene context and intended actions. The model is evaluated on various 3D scene datasets, showing remarkable zero-shot generalizability and producing motions that closely mimic original motion-captured sequences. The method is also benchmarked for human pose and contact estimation tasks, demonstrating its versatility and establishing it as a valuable asset for future research. The paper also presents a detailed analysis of the dataset and method, including experimental results and comparisons with existing approaches. The method is capable of real-time generation of HSI sequences of arbitrary length and has been shown to outperform existing baselines in terms of quality and zero-shot generalizability. The paper concludes that TRUMANS is a high-quality resource for HSI research, addressing scalability, data quality, and advanced motion synthesis challenges in HSI modeling.