pip3 install torch==2.9.0 # >= 2.7.1
pip3 install -U "cache-dit[all]" # >= 1.0.9
pip3 install git+https://github.com/huggingface/diffusers.git # latest mainWe have release a Hybrid Acceleration example (📚qwen_image_fast.py) with 4.8x🎉 speedup in this repo for Qwen-Image, feel free to take a try (Hybrid Cache Acceleration + Context Parallelism + FP8 Weight Only + Torch Compile). For example:
# Baseline (NVIDIA L20 48GiB, ~120s w/ Model CPU Offload)
python3 qwen_image_fast.py --height 1024 --width 1024
# + (DBCache + TaylorSeer)
# + Context Parallelism (Ulysses)
# + FP8 Weight Only (not require offload anymore)
# + Torch Compile (NVIDIA L20x2, ~25s, ~4.8x speedup)
torchrun --nproc_per_node=2 qwen_image_fast.py \
--height 1024 --width 1024 \
--parallel-type ulysses --quantize \
--cache --Fn 1 --rdt 0.12 --mcc 2 --taylorseer \
--compile| 🤖Baseline w/o Acceleration | 🎉w/ Hybrid Acceleration |
|---|---|
| ~120s, 60+ GiB per GPU | ~25s, ~4.8x speedup, 36 GiB per GPU |
![]() |
![]() |
This repo is based on cache-dit and diffusers. Many thanks to these awesome open-source projects.

