Skip to content

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

License

Notifications You must be signed in to change notification settings

wenhao728/awesome-diffusion-v2v

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Diffusion Model-Based Video Editing: A Survey

GitHub last commit

Wenhao Sun, Rong-Cheng Tu, Jingyi Liao, Dacheng Tao
Nanyang Technological University

teaser.mp4

πŸ“Œ Table of Contents

Introduction


Overview of diffusion-based video editing model components.

The diffusion process defines a Markov chain that progressively adds random noise to data and learns to reverse this process to generate desired data samples from noise. Deep neural networks facilitate the transitions between latent states.

Network and Training Paradigm

Temporal Adaption

Method Paper Project Publication Year
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation arXiv Website, GitHub ICCV Dec 2022
Towards Consistent Video Editing with Text-to-Image Diffusion Models arXiv NeurIPS May 2023
SimDA: Simple Diffusion Adapter for Efficient Video Generation arXiv Website, GitHub Preprint Aug 2023
VidToMe: Video Token Merging for Zero-Shot Video Editing arXiv Website, GitHub Preprint Dec 2023
Fairy: Fast Parallelized Instruction-Guided Video-to-Video Synthesis arXiv Website Preprint Dec 2023

(back to top)

Structure Conditioning

Method Paper Project Publication Year
Structure and Content-Guided Video Synthesis with Diffusion Models arXiv Website Preprint Feb 2023
VideoComposer: Compositional Video Synthesis with Motion Controllability arXiv Website, GitHub NeurIPS Jun 2023
VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet arXiv GitHub Preprint Jul 2023
MagicEdit: High-Fidelity and Temporally Coherent Video Editing arXiv Website, GitHub Preprint Aug 2023
CCEdit: Creative and Controllable Video Editing via Diffusion Models arXiv Website, GitHub Preprint Sep 2023
Ground-A-Video: Zero-shot Grounded Video Editing using Text-to-image Diffusion Models arXiv Website, GitHub ICLR Oct 2023
LAMP: Learn A Motion Pattern for Few-Shot-Based Video Generation arXiv Website, GitHub Preprint Oct 2023
Motion-Conditioned Image Animation for Video Editing arXiv Website, GitHub Preprint Nov 2023
FlowVid: Taming Imperfect Optical Flows for Consistent Video-to-Video Synthesis arXiv Website, GitHub Preprint Dec 2023
EVA: Zero-shot Accurate Attributes and Multi-Object Video Editing arXiv Website, GitHub Preprint Mar 2024

(back to top)

Training Modification

Method Paper Project Publication Year
Dreamix: Video Diffusion Models are General Video Editors arXiv Website Preprint Feb 2023
InstructVid2Vid: Controllable Video Editing with Natural Language Instructions arXiv Preprint May 2023
MotionDirector: Motion Customization of Text-to-Video Diffusion Models arXiv Website, GitHub Preprint Oct 2023
VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models arXiv Website, GitHub Preprint Nov 2023
Consistent Video-to-Video Transfer Using Synthetic Dataset arXiv GitHub ICLR Nov 2023
VMC: Video Motion Customization using Temporal Attention Adaption for Text-to-Video Diffusion Models arXiv Website, GitHub CVPR Dec 2023
SAVE: Protagonist Diversification with Structure Agnostic Video Editing arXiv Website, GitHub Preprint Dec 2023
VASE: Object-Centric Appearance and Shape Manipulation of Real Videos arXiv Website, GitHub Preprint Jan 2024

(back to top)

Attention Feature Injection

Inversion-Based Feature Injection

Method Paper Project Publication Year
Video-P2P: Video Editing with Cross-attention Control arXiv Website, GitHub CVPR Mar 2023
Edit-A-Video: Single Video Editing with Object-Aware Consistency arXiv Website Preprint Mar 2023
FateZero: Fusing Attentions for Zero-shot Text-based Video Editing arXiv Website, GitHub ICCV Mar 2023
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models arXiv GitHub Preprint Mar 2023
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts arXiv Website, GitHub Preprint May 2023
UniEdit: A Unified Tuning-Free Framework for Video Motion and Appearance Editing arXiv Website, GitHub Preprint Feb 2023
AnyV2V: A Tuning-Free Framework For Any Video-to-Video Editing Tasks arXiv Website, GitHub Preprint Mar 2024

(back to top)

Motion-Based Feature Injection

Method Paper Project Publication Year
TokenFlow: Consistent Diffusion Features for Consistent Video Editing arXiv Website, GitHub ICLR Jul 2023
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing arXiv Website, GitHub ICLR Oct 2023
FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation arXiv Website, GitHub CVPR Mar 2024

(back to top)

Diffusion Latents Manipulation

Latent Initialization

Method Paper Project Publication Year
Text2Video-Zero: Text-to-Image Diffusion Models are Zero-Shot Video Generators arXiv Website, GitHub ICCV Mar 2023
Control-A-Video: Controllable Text-to-Video Generation with Diffusion Models arXiv Website, GitHub Preprint May 2023
Video ControlNet: Towards Temporally Consistent Synthetic-to-Real Video Translation Using Conditional Image Diffusion Models arXiv Preprint May 2023
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing arXiv Website, GitHub CVPR Dec 2023

(back to top)

Latent Transition

Method Paper Project Publication Year
Pix2Video: Video Editing using Image Diffusion arXiv Website, GitHub ICCV Mar 2023
ControlVideo: Training-free Controllable Text-to-Video Generation arXiv Website, GitHub ICLR May 2023
Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation arXiv Website, GitHub SIGGRAPH Jun 2023
DiffSynth: Latent In-Iteration Deflickering for Realistic Video Synthesis arXiv Website, GitHub Preprint Aug 2023
RAVE: Randomized Noise Shuffling for Fast and Consistent Video Editing with Diffusion Models arXiv Website, GitHub CVPR Dec 2023
MotionClone: Training-Free Motion Cloning for Controllable Video Generation arXiv Website, GitHub Preprint Jun 2024
GenVideo: One-shot target-image and shape aware video editing using T2I diffusion models arXiv CVPR Apr 2024

(back to top)

Canonical Representation

Method Paper Project Publication Year
Shape-aware Text-driven Layered Video Editing Open Access Website, GitHub CVPR Jan 2023
VidEdit: Zero-Shot and Spatially Aware Text-Driven Video Editing arXiv Website TMLR Jun 2023
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing arXiv Website, GitHub CVPR Aug 2023
StableVideo: Text-driven Consistency-aware Diffusion Video Editing arXiv GitHub ICCV Aug 2023
DiffusionAtlas: High-Fidelity Consistent Diffusion Video Editing arXiv Website Preprint Dec 2023

(back to top)

Novel Conditioning

Point-Based Editing

Method Paper Project Publication Year
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence arXiv Website, GitHub CVPR Dec 2023
DragVideo: Interactive Drag-style Video Editing arXiv GitHub Preprint Dec 2023
Drag-A-Video: Non-rigid Video Editing with Point-based Interaction arXiv Preprint Dec 2023
MotionCtrl: A Unified and Flexible Motion Controller for Video Generation arXiv GitHub, Website Preprint Dec 2023

(back to top)

Pose-Guided Human Action Editing

Method Paper Project Publication Year
Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos arXiv Website, GitHub AAAI Apr 2023
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion arXiv Website, GitHub ICCV Apr 2023
DisCo: Disentangled Control for Realistic Human Dance Generation arXiv Website, GitHub CVPR Jun 2023
MagicPose: Realistic Human Poses and Facial Expressions Retargeting with Identity-aware Diffusion arXiv Website, GitHub ICML Nov 2023
MagicAnimate: Temporally Consistent Human Image Animation using Diffusion Model arXiv Website, GitHub Preprint Nov 2023
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation arXiv Website, Official GitHub, Community Implementation Preprint Nov 2023
Zero-shot High-fidelity and Pose-controllable Character Animation arXiv Preprint Apr 2024

(back to top)

πŸ“ˆ V2VBench

Leaderboard

V2VBench is a comprehensive benchmark designed to evaluate video editing methods. It consists of:

For detailed information, please refer to the accompanying paper.

🍻 Citation

If you find this repository helpful, please consider citing our paper:

@article{sun2024v2vsurvey,
    author = {Wenhao Sun and Rong-Cheng Tu and Jingyi Liao and Dacheng Tao},
    title = {Diffusion Model-Based Video Editing: A Survey},
    journal = {CoRR},
    volume = {abs/2407.07111},
    year = {2024}
}

About

Awesome diffusion Video-to-Video (V2V). A collection of paper on diffusion model-based video editing, aka. video-to-video (V2V) translation. And a video editing benchmark code.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages