v1.4.2 — first unified release. Consolidates the 5 mirror binary repos (kspaceFirstOrder-CUDA-linux, -windows, kspaceFirstOrder-OMP-linux, -windows, k-wave-omp-darwin) into a single source tree and a single release tag (closes the consolidation work tracked in #13).
Highlights
- CUDA arch coverage expanded from 9 to 16. Now covers every shipping NVIDIA GPU from Turing (sm_75) through every Blackwell variant: B200/GB200, B300/GB300, Jetson Thor, RTX 50xx, RTX PRO 6000 Blackwell, GB10/DGX Spark (see #25). Maxwell/Pascal/Volta are not supported — CUDA Toolkit 13.0 removed them.
- Windows CUDA binary regression fixed. v1.4.1 silently shipped only
sm_75SASS (3.4 MB) due to aproject(LANGUAGES CXX CUDA)ordering bug that caused the multi-arch default to be discarded. v1.4.2 ships the full 16-arch fat binary (14.8 MB + 284 MB cufft64_12.dll runtime). Verified viacuobjdump --list-elfin CI. - Single release, all platforms. No more per-mirror version drift (v1.3.1 / v1.3.0 / v0.3.0rc3 split is gone). Each release tag → one set of platform-suffixed assets, all built from one source SHA.
- CI diagnostics. Build logs now print the final
CMAKE_CUDA_ARCHITECTURESafter filtering and runcuobjdump --list-elfon the built binary, so each release records exactly which archs shipped.
Downstream
waltsims/k-wave-python#756 pins to this release and will cut as v0.6.3.
Migration
- k-wave-python: bump
BINARY_VERSIONtov1.4.2, collapseURL_DICTto the single unified base. The PR above does this. - Maxwell/Pascal/Volta users (GTX 9xx/10xx, V100, P100, Jetson Nano/TX1/TX2/Xavier): use
backend="python"(NumPy/CuPy works on every CUDA-capable GPU) or build the C++ backend from source against CUDA Toolkit 12.x.
What follows
Post-v1.4.2 cleanup tracked in #26: archive the 5 mirror repos (after k-wave-python v0.6.3 bakes ~2-4 weeks), repos/ → kspace-cuda/+kspace-openmp/ path refactor, long-horizon nanobind+wheels.
What's Changed
- Consolidate 5 mirror submodules into unified via git subtree (unified#13) by @waltsims in #18
- Delete stale Windows workflow from subtree-add (closes #20) by @waltsims in #21
- Consolidate plans: 11 docs to arxiv/, single ROADMAP.md going forward by @waltsims in #22
- Rewrite release-on-tag.yml: publish one unified release (unified#13) by @waltsims in #19
- README: drop stale submodule step, add repo layout overview by @waltsims in #23
- Fix Windows CUDA DLL packaging by @faridyagubbayli in #24
- Expand CUDA arch coverage to Blackwell variants + fix Windows arch list bug by @waltsims in #25
- release-on-tag: demote DLL byte-conflict from error to warning by @waltsims in #28
- release-on-tag: set GH_REPO so publish job doesn't need a git checkout by @waltsims in #29
Full Changelog: v1.4.1...v1.4.2