This paper introduces the Large Motion Model (LMM), a unified multi-modal motion generation model that can perform multiple motion generation tasks simultaneously and achieve competitive performance across nine widely used benchmarks. LMM is built on a transformer-based diffusion model and incorporates an advanced attention mechanism, ArtAttention, which enables precise and robust control over different body parts. The model is trained on a comprehensive motion generation dataset called Motion-Verse, which includes 10 tasks, 16 datasets, a total of 320k sequences, and 100 million frames. The dataset is unified into a common motion representation format, allowing the model to learn from diverse data sources and achieve strong generalization across various motion generation tasks. LMM also employs a novel pre-training strategy that uses variable frame rates and masking forms to better exploit knowledge from diverse training data. Experimental results show that LMM achieves state-of-the-art results across various tasks, demonstrating its exceptional generalization performance. Additionally, LMM has the ability to process multi-modal inputs simultaneously, enabling it to accomplish unseen tasks. The paper also discusses the challenges of multi-modal and multi-task motion generation, including non-uniformity of motion data formats, different evaluation metrics due to varying task objectives, and difficulty in transferring action knowledge across multiple tasks. The proposed LMM addresses these challenges through a unified motion representation, a novel attention mechanism, and a pre-training strategy that leverages extensive motion datasets. The model is evaluated on several tasks, including text-to-motion, music-to-dance, and motion prediction, and shows strong performance across these tasks. The paper also includes ablation studies that reveal valuable insights about training and scaling up large motion models for future research.This paper introduces the Large Motion Model (LMM), a unified multi-modal motion generation model that can perform multiple motion generation tasks simultaneously and achieve competitive performance across nine widely used benchmarks. LMM is built on a transformer-based diffusion model and incorporates an advanced attention mechanism, ArtAttention, which enables precise and robust control over different body parts. The model is trained on a comprehensive motion generation dataset called Motion-Verse, which includes 10 tasks, 16 datasets, a total of 320k sequences, and 100 million frames. The dataset is unified into a common motion representation format, allowing the model to learn from diverse data sources and achieve strong generalization across various motion generation tasks. LMM also employs a novel pre-training strategy that uses variable frame rates and masking forms to better exploit knowledge from diverse training data. Experimental results show that LMM achieves state-of-the-art results across various tasks, demonstrating its exceptional generalization performance. Additionally, LMM has the ability to process multi-modal inputs simultaneously, enabling it to accomplish unseen tasks. The paper also discusses the challenges of multi-modal and multi-task motion generation, including non-uniformity of motion data formats, different evaluation metrics due to varying task objectives, and difficulty in transferring action knowledge across multiple tasks. The proposed LMM addresses these challenges through a unified motion representation, a novel attention mechanism, and a pre-training strategy that leverages extensive motion datasets. The model is evaluated on several tasks, including text-to-motion, music-to-dance, and motion prediction, and shows strong performance across these tasks. The paper also includes ablation studies that reveal valuable insights about training and scaling up large motion models for future research.