Skip to content

xlite-dev/qwen-image-fast

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⚡️Qwen-Image-Fast

🔥Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs (< 48GiB)

⚙️Installation

pip3 install torch==2.9.0 # >= 2.7.1
pip3 install -U "cache-dit[all]" # >= 1.0.9
pip3 install git+https://github.com/huggingface/diffusers.git # latest main

📚Examples

We have release a Hybrid Acceleration example (📚qwen_image_fast.py) with 4.8x🎉 speedup in this repo for Qwen-Image, feel free to take a try (Hybrid Cache Acceleration + Context Parallelism + FP8 Weight Only + Torch Compile). For example:

# Baseline (NVIDIA L20 48GiB, ~120s w/ Model CPU Offload)
python3 qwen_image_fast.py --height 1024 --width 1024

# + (DBCache + TaylorSeer) 
# + Context Parallelism (Ulysses)
# + FP8 Weight Only (not require offload anymore) 
# + Torch Compile (NVIDIA L20x2, ~25s, ~4.8x speedup)
torchrun --nproc_per_node=2 qwen_image_fast.py \
         --height 1024 --width 1024 \
         --parallel-type ulysses --quantize \
         --cache --Fn 1 --rdt 0.12 --mcc 2 --taylorseer \
         --compile
🤖Baseline w/o Acceleration 🎉w/ Hybrid Acceleration
~120s, 60+ GiB per GPU ~25s, ~4.8x speedup, 36 GiB per GPU

©️Acknowledgements

This repo is based on cache-dit and diffusers. Many thanks to these awesome open-source projects.

About

⚡️Qwen-Image 4.8x🎉 speedup with Hybrid Acceleration for low VRAM GPUs

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages