Releases: togethercomputer/xorl-wheels
Releases · togethercomputer/xorl-wheels
TransformerEngine 2.11.0
TransformerEngine 2.11.0+c188b533
Built with:
- Python 3.12
- PyTorch 2.10.0
- CUDA 13.0
- CUDA architectures: sm_75, sm_80, sm_89, sm_90, sm_100, sm_120
TransformerEngine 2.10.0
TransformerEngine 2.10.0+769ed778
Built with:
- Python 3.12
- PyTorch 2.10.0
- CUDA 12.9
- CUDA architectures: sm_70, sm_80, sm_89, sm_90, sm_100, sm_120
TileLang 0.1.10 + #2303 (CUDA 13.1, PyTorch 2.10)
TileLang built from stock upstream tile-ai/tilelang at commit a8d93798 (post-v0.1.10), which includes PR #2303 ([CUDA] Support preferred copy instruction lowering, d0937562) — i.e. the T.copy(..., prefer_instruction="tma") API. The released PyPI tilelang==0.1.10 predates #2303 and lacks it.
- Source:
tile-ai/tilelang@a8d93798(unmodified upstream) - Build: CUDA 13.1, for PyTorch 2.10 (cu129 runtime);
cp38-abi3→ installs on CPython ≥3.10 - Why hosted: consumed by
xorl's FlashQLA GDN backend (XORL_GDN_BACKEND=flashqla). FlashQLA needs both thetl_gemmbuiltin (in stock ≥0.1.10) andprefer_instruction="tma"(#2303). The fastgemm_v1path is re-added in-repo by xorl'stilelang_gemm_v1shim (no tilelang source fork); this wheel is plain upstream.
Replace with PyPI tilelang>=0.1.11 once a release carrying #2303 ships.
Mamba SSM 2.3.1 + Causal Conv1d 1.6.1 (CUDA 12.9, PyTorch 2.10)
Pre-compiled wheels for mamba-ssm 2.3.1 and causal-conv1d 1.6.1.
- Python 3.12, Linux x86_64
- Built against PyTorch 2.10.0+cu129, CUDA 12.9
- Source: https://github.com/state-spaces/mamba
FlashAttention 3.0.0b1
FlashAttention-3 v3.0.0b1
- Requires CUDA 12.3+
- Python 3.9+ (stable ABI)
- Hopper (SM90) optimized
DeepGEMM 2.3.0
DeepGEMM wheel for CUDA 13, PyTorch 2.10, Python 3.12
DeepEP 1.2.1 (CUDA 13.0)
DeepEP 1.2.1+567632d
Built with:
- Python 3.12
- PyTorch 2.11.0
- CUDA 13.0
DeepEP 1.2.1
Pre-built DeepEP wheel (commit 567632d, torch 2.9, CUDA 12, Python 3.12)