MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

1S-Lab, Nanyang Technological University
2SenseTime, China
*equal contribution    corresponding author

Abstract

Human motion modeling is important for many modern graphics applications, which typically require professional skills. In order to remove the skill barriers for laymen, recent motion generation methods can directly generate human motions conditioned on natural languages. However, it remains challenging to achieve diverse and fine-grained motion generation with various text inputs. To address this problem, we propose MotionDiffuse, the first diffusion model-based text-driven motion generation framework, which demonstrates several desired properties over existing methods. 1) Probabilistic Mapping. Instead of a deterministic language-motion mapping, MotionDiffuse generates motions through a series of denoising steps in which variations are injected. 2) Realistic Synthesis. MotionDiffuse excels at modeling complicated data distribution and generating vivid motion sequences. 3) Multi-Level Manipulation. MotionDiffuse responds to fine-grained instructions on body parts, and arbitrary-length motion synthesis with time-varied text prompts. Our experiments show MotionDiffuse outperforms existing SoTA methods by convincing margins on text-driven motion generation and action-conditioned motion generation. A qualitative analysis further demonstrates MotionDiffuse's controllability for comprehensive motion generation

Pipeline

Text-driven Motion Generation

Quantitative Results

Qualitative Results

Action-conditioned Motion Generation

Quantitative Results

BibTeX

@article{zhang2022motiondiffuse,
      title   =   {MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model}, 
      author  =   {Zhang, Mingyuan and
                   Cai, Zhongang and
                   Pan, Liang and
                   Hong, Fangzhou and
                   Guo, Xinying and
                   Yang, Lei and
                   Liu, Ziwei},
      year    =   {2022},
      journal =   {arXiv preprint arXiv:2208.15001},
}

Acknowledgement

This work is supported by NTU NAP, MOE AcRF Tier 2 (T2EP20221-0033), and under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s).

We referred to the project page of Nerfies when creating this project page.