17 Jul 2024 | Lei Zhong1, Yiming Xie1, Varun Jampani2, Deqing Sun3, and Huaizu Jiang1
**SMooDi: Stylized Motion Diffusion Model**
**Authors:** Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang
**Institution:** Northeastern University, Stability AI, Google Research
**Abstract:**
This paper introduces SMooDi, a novel Stylized Motion Diffusion model that generates stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate various content motions or transfer styles between sequences, SMooDi can rapidly generate diverse styles across a broad range of content. The model is tailored with a pre-trained text-to-motion model for stylization, incorporating a style guidance module to ensure the generated motion closely matches the reference style, and a lightweight style adaptor to direct the motion towards the desired style while maintaining realism. Experiments on various applications demonstrate that SMooDi outperforms existing methods in stylized motion generation.
**Keywords:**
Motion synthesis, Diffusion model, Stylized motion
**Introduction:**
The paper addresses the problem of generating stylized motion from content texts and style motion sequences. Traditional methods are labor-intensive and time-consuming, while recent advances in human motion generation using diffusion models have shown impressive results. However, most efforts focus on content generation, and integrating style conditions remains under-explored. SMooDi combines these two lines of research, using a pre-trained motion latent diffusion model (MLD) to generate diverse content and a style modulation module to incorporate style conditions.
**Style Modulation Module:**
The main novelty of SMooDi is the style modulation module, which includes a style adaptor and a style guidance module. The style adaptor predicts residual features conditioned on the style reference motion sequence, ensuring realism while incorporating the style condition. The style guidance module combines classifier-free and classifier-based guidance to control the stylized motion generation, ensuring both content preservation and style reflection.
**Experiments:**
Experiments on the HumanML3D and 100STYLE datasets demonstrate that SMooDi outperforms baseline models in generating stylized motion driven by content text, excelling in both content preservation and style reflection. SMooDi also achieves comparable performance to state-of-the-art methods in motion style transfer.
**Contributions:**
1. SMooDi is the first approach to adapt a pre-trained text-to-motion model for diverse stylized motion generation.
2. The proposed style modulation module enables stylized motion generation with style reflection, content preservation, and realism.
3. SMooDi sets a new state-of-the-art in stylized motion generation driven by content text and achieves comparable performance in motion style transfer.
**Related Work:**
The paper discusses related work in human motion generation, motion style transfer, and stylized motion diffusion models, highlighting the limitations of existing methods and the contributions of SMooDi.
**Conclusion:**
SMooDi is a novel approach that leverages a pre-trained motion diffusion model to generate stylized**SMooDi: Stylized Motion Diffusion Model**
**Authors:** Lei Zhong, Yiming Xie, Varun Jampani, Deqing Sun, Huaizu Jiang
**Institution:** Northeastern University, Stability AI, Google Research
**Abstract:**
This paper introduces SMooDi, a novel Stylized Motion Diffusion model that generates stylized motion driven by content texts and style motion sequences. Unlike existing methods that either generate various content motions or transfer styles between sequences, SMooDi can rapidly generate diverse styles across a broad range of content. The model is tailored with a pre-trained text-to-motion model for stylization, incorporating a style guidance module to ensure the generated motion closely matches the reference style, and a lightweight style adaptor to direct the motion towards the desired style while maintaining realism. Experiments on various applications demonstrate that SMooDi outperforms existing methods in stylized motion generation.
**Keywords:**
Motion synthesis, Diffusion model, Stylized motion
**Introduction:**
The paper addresses the problem of generating stylized motion from content texts and style motion sequences. Traditional methods are labor-intensive and time-consuming, while recent advances in human motion generation using diffusion models have shown impressive results. However, most efforts focus on content generation, and integrating style conditions remains under-explored. SMooDi combines these two lines of research, using a pre-trained motion latent diffusion model (MLD) to generate diverse content and a style modulation module to incorporate style conditions.
**Style Modulation Module:**
The main novelty of SMooDi is the style modulation module, which includes a style adaptor and a style guidance module. The style adaptor predicts residual features conditioned on the style reference motion sequence, ensuring realism while incorporating the style condition. The style guidance module combines classifier-free and classifier-based guidance to control the stylized motion generation, ensuring both content preservation and style reflection.
**Experiments:**
Experiments on the HumanML3D and 100STYLE datasets demonstrate that SMooDi outperforms baseline models in generating stylized motion driven by content text, excelling in both content preservation and style reflection. SMooDi also achieves comparable performance to state-of-the-art methods in motion style transfer.
**Contributions:**
1. SMooDi is the first approach to adapt a pre-trained text-to-motion model for diverse stylized motion generation.
2. The proposed style modulation module enables stylized motion generation with style reflection, content preservation, and realism.
3. SMooDi sets a new state-of-the-art in stylized motion generation driven by content text and achieves comparable performance in motion style transfer.
**Related Work:**
The paper discusses related work in human motion generation, motion style transfer, and stylized motion diffusion models, highlighting the limitations of existing methods and the contributions of SMooDi.
**Conclusion:**
SMooDi is a novel approach that leverages a pre-trained motion diffusion model to generate stylized