v1.2.1 USP, 2D/3D Parallel
🎉 v1.2.1 release is ready, the major updates including: Ring Attention w/ batched P2P, USP (Hybrid Ring and Ulysses), Hybrid 2D and 3D Parallelism (💥USP + TP), VAE-P Comm overhead reduce.
# Hybrid 2D/3D Parallelism in Cache-DiT is fully compatible w/ torch.compile,
# Cache Acceleration, Text Encoder Parallelism, VAE Parallelism and more.
torchrun --nproc_per_node=8 -m cache_dit.generate flux2 --config parallel_2d.yaml --compile
torchrun --nproc_per_node=8 -m cache_dit.generate flux2 --config parallel_3d.yaml --compile
torchrun --nproc_per_node=8 -m cache_dit.generate --parallel ulysses_tp --cache --compileWhat's Changed
- [chore] Align torch generator with example by @BBuf in #723
- Fix generator bug in cache-dit by @BBuf in #724
- examples: allow custom generator device by @DefTruth in #726
- examples: allow custom warmup-steps by @DefTruth in #727
- docs: add latest news by @DefTruth in #728
- docs: fix docs format by @DefTruth in #729
- fix selected metrics print by @66RING in #730
- docs: add flux examples to tp docs by @DefTruth in #731
- fix ltx-2 i2v example by @DefTruth in #734
- Update README.md by @DefTruth in #735
- chore: allow use default steps for scm by @DefTruth in #736
- [chore] support gpu generator in server by @BBuf in #737
- docs: update download badge by @DefTruth in #738
- Refine profiler and serving docs by @BBuf in #739
- example image-path support url by @BBuf in #742
- fix UAA broken while using joint attn by @DefTruth in #743
- compile: avoid graph break for UAA by @DefTruth in #744
- refactor configs yml in examples by @DefTruth in #745
- relax npu attention import by @DefTruth in #747
- feat: add set_attn_backend api by @DefTruth in #748
- docs: update quick start by @DefTruth in #749
- fix ring attn w/ native backend in torch 2.10 by @DefTruth in #750
- feat: NPU FA support attention mask by @zhangtao0408 in #751
- feat: add cache-dit-generate cli tool by @DefTruth in #752
- docs: update ascend npu examples by @DefTruth in #753
- feat: support ring attn p2p comm by @DefTruth in #754
- feat: support USP -> Ulysses + Ring by @DefTruth in #755
- fix npu import error w/o triton by @DefTruth in #756
- chore: use batched isend/irecv for vae-p by @DefTruth in #757
- feat: tile batched p2p comm for vae-p by @DefTruth in #758
- docs: update example installation by @DefTruth in #760
- reduce comm overhead for vae-p by @DefTruth in #762
- [chore] Fix FLUX2 Ulysses Anything NCCL Hang by @BBuf in #761
- [2/N] reduce comm overhead for vae-p by @DefTruth in #763
- misc: fix sglang diffusion docs link by @DefTruth in #764
- feat: support hybrid CP/SP + TP by @DefTruth in #765
- chore: use _cp_rank for cp_config & fix docs by @DefTruth in #766
- fix hybrid parallel docs by @DefTruth in #768
- chore: fix api docs typo by @DefTruth in #769
- fix api docs typo by @DefTruth in #770
- feat: support latest z-image in examples by @DefTruth in #771
- chore: add show case for parallel vae by @DefTruth in #772
- chore: update docs by @DefTruth in #773
- [2/N] update docs part-2 by @DefTruth in #774
- docs: add ComfyUI-CacheDiT link by @DefTruth in #775
- feat: load config support hybrid parallel by @DefTruth in #777
New Contributors
- @66RING made their first contribution in #730
- @zhangtao0408 made their first contribution in #751
Full Changelog: v1.2.0...v1.2.1