Siao Tang1
·
Xinyin Ma1
·
Gongfan Fang1
·
Xingyi Yang2
·
Xinchao Wang1
1National University of Singapore 2The Hong Kong Polytechnic University
Q-ARVD proposes the first quantization framework tailored for autoregressive video diffusion models. It introduces a final-quality guided frame-weighting mechanism to handle the unbalanced frame-wise quantization sensitivity, and an outlier-aware adaptive dual-scale strategy to address the heterogeneous outlier patterns.
Create a conda environment and install dependencies:
conda create -n q_arvd python=3.10 -y
conda activate q_arvd
# Set up self-forcing environment
## pytorch
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
## flash attention
wget https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
pip install flash_attn-2.7.4.post1+cu12torch2.5cxx11abiFALSE-cp310-cp310-linux_x86_64.whl
## others
pip install -r requirements.txt
## Vbench
pip install vbench
pip install detectron2@git+https://github.com/facebookresearch/detectron2.git # require pytorch version with CUDA<=12.1
hf download Wan-AI/Wan2.1-T2V-1.3B --local-dir wan_models/Wan2.1-T2V-1.3B
hf download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .
bash scripts/get_chunkwise_sensitivity.sh
After completing the script, you will obtain a chunk-wise sensitivity like [0.5462, 0.1668, 0.1189, 0.0789, 0.0534, 0.0263, 0.0096], which will be used in the quantization process (Step2).
bash scripts/train_quantization.sh
# 1. generate reference images with bfloat16 model
bash scripts/infer_fp_model.sh
# 2. generate quantized images with quantized model
bash scripts/infer_quant_model.sh
bash scripts/eval_quant_model.sh
This codebase is built on top of the open-source implementations, including Self-forcing, PTQ4ViT, BRECQ, and common_metrics_on_video_quality, etc.
If you find this codebase useful for your research, please kindly cite our paper:
TODO
