Home   ·   Publications   ·   More

Publications  [Google Scholar]

* indicates equal contribution, ✉ indicates corresponding / co-corresponding author


Large Motion Model for Unified Multi-Modal Motion Generation

Mingyuan Zhang*, Daisheng Jin*, Chenyang Gu*, Fangzhou Hong, Zhongang Cai, Jingfang Huang, Chongzhi Zhang, Xinying Guo, Lei Yang, Ying He, Ziwei Liu
arXiV, 2024
[Paper]  [Project Page]  [Video] [Code] Star

The Large Motion Model (LMM) unifies various motion generation tasks into a scalable, generalist model, demonstrating broad applicability and strong generalization across diverse tasks.



Digital Life Project: Autonomous 3D Characters with Social Intelligence

Zhongang Cai*, Jianping Jiang*, Zhongfei Qing*, Xinying Guo*, Mingyuan Zhang*, Zhengyu Lin, Haiyi Mei, Chen Wei, Ruisi Wang, Wanqi Yin, Xiangyu Fan, Han Du, Liang Pan, Peng Gao, Zhitao Yang, Yang Gao, Jiaqi Li, Tianxiang Ren, Yukun Wei, Xiaogang Wang, Chen Change Loy, Lei Yang✉, Ziwei Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2024
[Paper]  [Project Page]  [Code]

Digital Life Project is a framework utilizing language as the universal medium to build autonomous 3D characters, who are capable of engaging in social interactions and expressing with articulated body motions, thereby simulating life in a digital environment.



FineMoGen: Fine-Grained Spatio-Temporal Motion Generation and Editing

Mingyuan Zhang, Huirong Li, Zhongang Cai, Jiawei Ren, Lei Yang, Ziwei Liu
Neural Information Processing Systems (NeurIPS), 2023
[Paper]  [Project Page]  [Code] Star

FineMoGen is a diffusion-based motion generation and editing framework that can synthesize fine-grained motions, with spatial-temporal composition to the user instructions.



InsActor: Instruction-driven Physics-based Characters

Jiawei Ren*, Mingyuan Zhang*, Cunjun Yu*, Xiao Ma, Liang Pan, Ziwei Liu
Neural Information Processing Systems (NeurIPS), 2023
[Paper]  [Project Page]  [Code] Star

InsActor is a principled generative framework that leverages recent advancements in diffusion-based human motion models to produce instruction-driven animations of physics-based characters.



SMPLer-X: Scaling Up Expressive Human Pose and Shape Estimation

Zhongang Cai*, Wanqi Yin*, Ailing Zeng*, Chen Wei, Qingping Sun, Yanjun Wang, Hui En Pang, Haiyi Mei, Mingyuan Zhang, Lei Zhang, Chen Change Loy, Lei Yang, Ziwei Liu
Neural Information Processing Systems (NeurIPS Datasets and Benchmarks Track), 2023
[Paper]  [Project Page]  [Code] Star

SMPLer-X is the first generalist foundation model for Expressive human pose and shape estimation (EHPS). With big data and the large model, SMPLer-X exhibits strong performance across diverse test benchmarks and excellent transferability to even unseen environments.



PointHPS: Cascaded 3D Human Pose and Shape Estimation from Point Clouds

Zhongang Cai*, Liang Pan*, Chen Wei, Wanqi Yin, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang, Ziwei Liu
ArXiV, 2023
[Paper]  [Project Page]  [Video] [Code] Star

PointHPS iteratively refines point features through a cascaded architecture to achieve more accurate 3D Human pose and shape estimation(HPS) from point clouds.



ReMoDiffuse: Retrieval-Augmented Motion Diffusion Model

Mingyuan Zhang, Xinying Guo, Liang Pan, Zhongang Cai, Fangzhou Hong, Huirong Li, Lei Yang, Ziwei Liu
International Conference on Computer Vision (ICCV), 2023
[Paper]  [Project Page]  [Video] [Code] [Colab Demo] [Hugging Face Demo] Star

ReMoDiffuse is a retrieval-augmented 3D human motion diffusion model. Benefiting from the extra knowledge from the retrieved samples, ReMoDiffuse is able to achieve high-fidelity on the given prompts.



BiBench: Benchmarking and Analyzing Network Binarization

Haotong Qin*, Mingyuan Zhang*, Yifu Ding, Aoyu Li, Zhongang Cai, Ziwei Liu, Fisher Yu, Xianglong Liu
International Conference on Machine Learning (ICML), 2023
[Paper] 

A rigorously designed benchmark with in-depth analysis for network binarization. It first carefully scrutinizes the requirements of binarization in the actual production and define evaluation tracks and metrics for a comprehensive and fair investigation.



MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

Mingyuan Zhang*, Zhongang Cai*, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu
Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
[Paper]  [Project Page]  [Video] [Code] [Colab Demo] [Hugging Face Demo] Star

The first text-driven motion generation pipeline based on diffusion models with probabilistic mapping, realistic synthesis and multi-level manipulation ability.



HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Zhongang Cai*, Daxuan Ren*, Ailing Zeng*, Zhengyu Lin*, Tao Yu*, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang✉, Ziwei Liu
European Conference on Computer Vision (ECCV), 2022 (Oral Presentation)
[Paper]  [Project Page]  [Video]

A large-scale multi-modal(color images, point clouds, keypoints, SMPL parameters, and textured meshes) 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.



AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

Fangzhou Hong*, Mingyuan Zhang*, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu
ACM Transactions on Graphics (SIGGRAPH), 2022
[Paper]  [Project Page]  [Video] [Code] [Colab Demo] Star

AvatarCLIP is the first zero-shot text-driven pipeline, which empowers layman users to generate and animate 3D Avatars by natural language description.



Balanced MSE for Imbalanced Visual Regression

Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral Presentation)
[Paper]  [Project Page]  [Talk] [Code] [Hugging Face Demo] Star

A statistically principled loss function to address the train/test mismatch in imbalanced regression, coincides with the supervised contrastive loss.



Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

Chongzhi Zhang*, Mingyuan Zhang*, Shanghang Zhang*, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Shuai Yi, Xianglong Liu, Ziwei Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[Paper]  [Code] Star

A systematical comparison of the generalization ability between CNNs and ViTs. Three representative generalization-enhancement techniques are applied to ViTs to further explore their inner properties.



Playing for 3D Human Recovery

Zhongang Cai*, Mingyuan Zhang*, Jiawei Ren*, Chen Wei, Daxuan Ren, Zhengyu Lin, Haiyu Zhao, Lei Yang, Chen Change Loy, Ziwei Liu
arXiV, 2021
[Paper]  [Code] Star

A large-scale synthetic human dataset collected using GTA-5 game engine, providing stable performance boost to both frame-based and video-based HMR.



BiBERT: Accurate Fully Binarized BERT

Haotong Qin*, Yifu Ding*, Mingyuan Zhang*, Qinghua Yan, Aishan Liu, Qingqing Dang, Ziwei Liu, Xianglong Liu
International Conference on Learning Representations (ICLR), 2022
[Paper]  [Code] Star

BiBERT is the first fully binarized BERT. It introduces an efficient Bi-Attention structure and a DMD scheme, which yields impressive 59.2x and 31.2x saving on FLOPs and model size.



REFINE: Prediction Fusion Network for Panoptic Segmentation

Jiawei Ren*, Cunjun Yu*, Zhongang Cai*, Mingyuan Zhang, Chongsong Chen, Haiyu Zhao, Shuai Yi, Hongsheng Li
Association for the Advancement of Artificial Intelligence (AAAI), 2021
[Paper]  [Project Page]

REFINE achieves high-quality panoptic segmentation by improving cross-task prediction fusion, and within-task prediction fusion.



CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

Daxuan Ren, Jianmin Zheng✉, Jianfei Cai, Jiatong Li, Haiyong Jiang, Zhongang Cai, Junzhe Zhang, Liang Pan, Mingyuan Zhang, Haiyu Zhao, Shuai Yi
International Conference on Computer Vision (ICCV), 2021
[Paper]  [Project Page] [Code] Star

CSG-Stump learns shapes from point clouds and discovers the underlying constituent modeling primitives and operations.



Towards Overcoming False Positives in Visual Relationship Detection

Daisheng Jin*, Xiao Ma*, Chongzhi Zhang, Yizhuo Zhou, Jiashu Tao, Zhoujun Li, Mingyuan Zhang✉,
British Machine Vision Conference (BMVC), 2021
[Paper] 

SABRA explores the imbalanced distribution in Human-Object Interaction detection. It further proposes a new pipeline to equip the model with sufficient spatial information.



BiPointNet: Binary Neural Network for Point Clouds

Haotong Qin*, Zhongang Cai*, Mingyuan Zhang*, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Liu✉, Hao Su
International Conference on Learning Representations (ICLR), 2021
[Paper]  [Code] Star

BiPointNet is the first fully binarized network for point cloud learning. BiPointNet gives an impressive 14.7x speedup and 18.9x storage saving on real-world resource-constrained devices.



Efficient Attention: Attention with Linear Complexities

Zhuoran Shen*, Mingyuan Zhang*, Haiyu Zhao, Shuai Yi, Hongsheng Li
Winter Conference on Applications of Computer Vision (WACV), 2021
[Paper]  [Code] Star

Efficient Attention reduces the memory and computational complexities of the attention mechanism from quadratic to linear. It demonstrates significant improvement in performance-cost trade-offs on a variety of tasks including object detection, instance segmentation, stereo depth estimation, and temporal action lcoalization.



Graph attention based proposal 3d convnets for action detection

Jun Li, Xianglong Liu✉, Zhuofan Zong, Wanru Zhao, Mingyuan Zhang, Jingkuan Song
Association for the Advancement of Artificial Intelligence (AAAI), 2020
[Paper] 

AGCN-P-3DCNNs fuses intra and inter attention to model intra long-range dependencies and inter dependencies simultaneously. It also contains a simple and effective framewise classifier, which enhances the feature presentation capabilities of backbone model.



Updated: 2024-4-2