Human-Motion-Video-Generation

This repo aims to record SOTA works on the human motion video generation.

Pose-Guided & Full Body

[2023-04] Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos | Paper | Code AAAI'24
[2023-05] ControlVideo: Adding Conditional Control for One Shot Text-to-Video Editing | Paper | Code arXiv'23
[2023-07] DISCO: Disentangled Control for Realistic Human Dance Generation | Paper | Code CVPR'24
[2023-11] MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion | Paper | Code ICML'24
[2023-11] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model | Paper | Code CVPR'24
[2023-11] Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation | Paper | Code CVPR'24
[2023-11] HumanGaussian: Text-Driven 3D Human Generation with Gaussian Splatting | Code CVPR'24
[2023-12] GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians | Paper Code CVPR'24
[2024-05] VividPose: Advancing Stable Video Diffusion for Realistic Human Image Animation | Paper
[2024-07] MimicMotion: High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance | Paper | Code
[2024-09] RealisDance: Equip Controllable Character Animation with Realistic Hands | Paper | Code
[2024-MM] MusePose: a Pose-Driven Image-to-Video Framework for Virtual Human Generation Code
[2025-02] HumanDiT: Pose-Guided Diffusion Transformer for Long-form Human Motion Video Generation | Paper
[2025-02] AnyCharV: Bootstrap Controllable Character Video Generation with Fine-to-Coarse Guidance | Paper | Code
[2025-04] UniAnimate: Taming Unified Video Diffusion Models for Consistent Human Image Animation | Code Sci. China Inf. Sci.'25
[2025-05] MTVCrafter: 4D Motion Tokenization for Open-World Human Image Animation | Paper | Code arXiv'25

Talking Head

[2023-11] MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model | Paper | Code
[2024-02] EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions | Paper | Code
[2024-11] EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation | Paper | Code CVPR'25
[2025-04] FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis | Paper | Code

Virtual Try-On

[2014] Garment Replacement in Monocular Video Sequences, | Paper TOG'14
[2019] FW-GAN: Flow-Navigated Warping GAN for Video Virtual Try-On| Paper ICCV'19
[2019] Unsupervised Image-to-Video Clothing Transfer | Paper ICCV'19
[2021] ShineOn: Illuminating Design Choices for Practical Video-based Virtual Clothing Try-on | Code WACVW'21
[2021] MV-TON: Memory-based Video Virtual Try-on network, | Paper | MM'21
[2022] ClothFormer: Taming Video Virtual Try-on in All Module | Paper | Code | CVPR'22
[2024-05] ViViD: Video Virtual Try-on using Diffusion Models | Paper | Code
[2025-01] 1-2-1: Renaissance of Single-Network Paradigm for Virtual Try-On | Paper | Code
[2025-02] CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation | Paper | Code

Vision Language Model
[2024-05] EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture | Paper | Code
[2024-08] CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer | Paper | Code
[2024-12] HunyuanVideo: A Systematic Framework For Large Video Generation Model | Paper | Code
[2024-12] Open-Sora Plan: Open-Source Large Video Generation Model | Paper | Code
[2025-02] Step-Video-T2V Technical Report: The Practice, Challenges, and Future of Video Foundation Model | Paper | Code
[2025-05] HunyuanVideo-Avatar: High-Fidelity Audio-Driven Human Animation for Multiple Characters | Paper | Code