TileLang built from stock upstream tile-ai/tilelang at commit a8d93798 (post-v0.1.10), which includes PR #2303 ([CUDA] Support preferred copy instruction lowering, d0937562) — i.e. the T.copy(..., prefer_instruction="tma") API. The released PyPI tilelang==0.1.10 predates #2303 and lacks it.
- Source:
tile-ai/tilelang@a8d93798(unmodified upstream) - Build: CUDA 13.1, for PyTorch 2.10 (cu129 runtime);
cp38-abi3→ installs on CPython ≥3.10 - Why hosted: consumed by
xorl's FlashQLA GDN backend (XORL_GDN_BACKEND=flashqla). FlashQLA needs both thetl_gemmbuiltin (in stock ≥0.1.10) andprefer_instruction="tma"(#2303). The fastgemm_v1path is re-added in-repo by xorl'stilelang_gemm_v1shim (no tilelang source fork); this wheel is plain upstream.
Replace with PyPI tilelang>=0.1.11 once a release carrying #2303 ships.