Publications

* indicates equal contribution, ✉ indicates corresponding / co-corresponding author

MotionDiffuse: Text-Driven Human Motion Generation with Diffusion Model

Mingyuan Zhang*, Zhongang Cai*, Liang Pan, Fangzhou Hong, Xinying Guo, Lei Yang, Ziwei Liu
arXiV, 2022
[Paper]  [Project Page]  [Video] [Code] [Colab Demo] [Hugging Face Demo] Star

The first text-driven motion generation pipeline based on diffusion models with probabilistic mapping, realistic synthesis and multi-level manipulation ability.

HuMMan: Multi-Modal 4D Human Dataset for Versatile Sensing and Modeling

Zhongang Cai*, Daxuan Ren*, Ailing Zeng*, Zhengyu Lin*, Tao Yu*, Xiangyu Fan, Yang Gao, Yifan Yu, Liang Pan, Fangzhou Hong, Mingyuan Zhang, Chen Change Loy, Lei Yang✉, Ziwei Liu
European Conference on Computer Vision (ECCV), 2022 (Oral Presentation)
[Paper]  [Project Page]  [Video]

A large-scale multi-modal(color images, point clouds, keypoints, SMPL parameters, and textured meshes) 4D human dataset with 1000 human subjects, 400k sequences and 60M frames.

AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars

Fangzhou Hong*, Mingyuan Zhang*, Liang Pan, Zhongang Cai, Lei Yang, Ziwei Liu
ACM Transactions on Graphics (SIGGRAPH), 2022
[Paper]  [Project Page]  [Video] [Code] [Colab Demo] Star

AvatarCLIP is the first zero-shot text-driven pipeline, which empowers layman users to generate and animate 3D Avatars by natural language description.

Balanced MSE for Imbalanced Visual Regression

Jiawei Ren, Mingyuan Zhang, Cunjun Yu, Ziwei Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2022 (Oral Presentation)
[Paper]  [Project Page]  [Talk] [Code] [Hugging Face Demo] Star

A statistically principled loss function to address the train/test mismatch in imbalanced regression, coincides with the supervised contrastive loss.

Delving Deep into the Generalization of Vision Transformers under Distribution Shifts

Chongzhi Zhang*, Mingyuan Zhang*, Shanghang Zhang*, Daisheng Jin, Qiang Zhou, Zhongang Cai, Haiyu Zhao, Shuai Yi, Xianglong Liu, Ziwei Liu
Conference on Computer Vision and Pattern Recognition (CVPR), 2022
[Paper]  [Code] Star

A systematical comparison of the generalization ability between CNNs and ViTs. Three representative generalization-enhancement techniques are applied to ViTs to further explore their inner properties.

Playing for 3D Human Recovery

Zhongang Cai*, Mingyuan Zhang*, Jiawei Ren*, Chen Wei, Daxuan Ren, Zhengyu Lin, Haiyu Zhao, Lei Yang, Chen Change Loy, Ziwei Liu
arXiV, 2021
[Paper]  [Code] Star

A large-scale synthetic human dataset collected using GTA-5 game engine, providing stable performance boost to both frame-based and video-based HMR.

BiBERT: Accurate Fully Binarized BERT

Haotong Qin*, Yifu Ding*, Mingyuan Zhang*, Qinghua Yan, Aishan Liu, Qingqing Dang, Ziwei Liu, Xianglong Liu
International Conference on Learning Representations (ICLR), 2022
[Paper]  [Code] Star

BiBERT is the first fully binarized BERT. It introduces an efficient Bi-Attention structure and a DMD scheme, which yields impressive 59.2x and 31.2x saving on FLOPs and model size.

REFINE: Prediction Fusion Network for Panoptic Segmentation

Jiawei Ren*, Cunjun Yu*, Zhongang Cai*, Mingyuan Zhang, Chongsong Chen, Haiyu Zhao, Shuai Yi, Hongsheng Li
Association for the Advancement of Artificial Intelligence (AAAI), 2021
[Paper]  [Project Page]

REFINE achieves high-quality panoptic segmentation by improving cross-task prediction fusion, and within-task prediction fusion.

CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

Daxuan Ren, Jianmin Zheng✉, Jianfei Cai, Jiatong Li, Haiyong Jiang, Zhongang Cai, Junzhe Zhang, Liang Pan, Mingyuan Zhang, Haiyu Zhao, Shuai Yi
International Conference on Computer Vision (ICCV), 2021
[Paper]  [Project Page] [Code] Star

CSG-Stump learns shapes from point clouds and discovers the underlying constituent modeling primitives and operations.

Towards Overcoming False Positives in Visual Relationship Detection

Daisheng Jin*, Xiao Ma*, Chongzhi Zhang, Yizhuo Zhou, Jiashu Tao, Zhoujun Li, Mingyuan Zhang✉,
British Machine Vision Conference (BMVC), 2021

SABRA explores the imbalanced distribution in Human-Object Interaction detection. It further proposes a new pipeline to equip the model with sufficient spatial information.

BiPointNet: Binary Neural Network for Point Clouds

Haotong Qin*, Zhongang Cai*, Mingyuan Zhang*, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Liu✉, Hao Su
International Conference on Learning Representations (ICLR), 2021
[Paper]  [Code] Star

BiPointNet is the first fully binarized network for point cloud learning. BiPointNet gives an impressive 14.7x speedup and 18.9x storage saving on real-world resource-constrained devices.

Efficient Attention: Attention with Linear Complexities

Zhuoran Shen*, Mingyuan Zhang*, Haiyu Zhao, Shuai Yi, Hongsheng Li
Winter Conference on Applications of Computer Vision (WACV), 2021
[Paper]  [Code] Star

Efficient Attention reduces the memory and computational complexities of the attention mechanism from quadratic to linear. It demonstrates significant improvement in performance-cost trade-offs on a variety of tasks including object detection, instance segmentation, stereo depth estimation, and temporal action lcoalization.

Graph attention based proposal 3d convnets for action detection

Jun Li, Xianglong Liu✉, Zhuofan Zong, Wanru Zhao, Mingyuan Zhang, Jingkuan Song
Association for the Advancement of Artificial Intelligence (AAAI), 2020

AGCN-P-3DCNNs fuses intra and inter attention to model intra long-range dependencies and inter dependencies simultaneously. It also contains a simple and effective framewise classifier, which enhances the feature presentation capabilities of backbone model.

