Skip to content

v1.4.2

Latest

Choose a tag to compare

@github-actions github-actions released this 21 Jun 06:46
· 1 commit to main since this release
4c26013

v1.4.2 — first unified release. Consolidates the 5 mirror binary repos (kspaceFirstOrder-CUDA-linux, -windows, kspaceFirstOrder-OMP-linux, -windows, k-wave-omp-darwin) into a single source tree and a single release tag (closes the consolidation work tracked in #13).

Highlights

  • CUDA arch coverage expanded from 9 to 16. Now covers every shipping NVIDIA GPU from Turing (sm_75) through every Blackwell variant: B200/GB200, B300/GB300, Jetson Thor, RTX 50xx, RTX PRO 6000 Blackwell, GB10/DGX Spark (see #25). Maxwell/Pascal/Volta are not supported — CUDA Toolkit 13.0 removed them.
  • Windows CUDA binary regression fixed. v1.4.1 silently shipped only sm_75 SASS (3.4 MB) due to a project(LANGUAGES CXX CUDA) ordering bug that caused the multi-arch default to be discarded. v1.4.2 ships the full 16-arch fat binary (14.8 MB + 284 MB cufft64_12.dll runtime). Verified via cuobjdump --list-elf in CI.
  • Single release, all platforms. No more per-mirror version drift (v1.3.1 / v1.3.0 / v0.3.0rc3 split is gone). Each release tag → one set of platform-suffixed assets, all built from one source SHA.
  • CI diagnostics. Build logs now print the final CMAKE_CUDA_ARCHITECTURES after filtering and run cuobjdump --list-elf on the built binary, so each release records exactly which archs shipped.

Downstream

waltsims/k-wave-python#756 pins to this release and will cut as v0.6.3.

Migration

  • k-wave-python: bump BINARY_VERSION to v1.4.2, collapse URL_DICT to the single unified base. The PR above does this.
  • Maxwell/Pascal/Volta users (GTX 9xx/10xx, V100, P100, Jetson Nano/TX1/TX2/Xavier): use backend="python" (NumPy/CuPy works on every CUDA-capable GPU) or build the C++ backend from source against CUDA Toolkit 12.x.

What follows

Post-v1.4.2 cleanup tracked in #26: archive the 5 mirror repos (after k-wave-python v0.6.3 bakes ~2-4 weeks), repos/kspace-cuda/+kspace-openmp/ path refactor, long-horizon nanobind+wheels.


What's Changed

  • Consolidate 5 mirror submodules into unified via git subtree (unified#13) by @waltsims in #18
  • Delete stale Windows workflow from subtree-add (closes #20) by @waltsims in #21
  • Consolidate plans: 11 docs to arxiv/, single ROADMAP.md going forward by @waltsims in #22
  • Rewrite release-on-tag.yml: publish one unified release (unified#13) by @waltsims in #19
  • README: drop stale submodule step, add repo layout overview by @waltsims in #23
  • Fix Windows CUDA DLL packaging by @faridyagubbayli in #24
  • Expand CUDA arch coverage to Blackwell variants + fix Windows arch list bug by @waltsims in #25
  • release-on-tag: demote DLL byte-conflict from error to warning by @waltsims in #28
  • release-on-tag: set GH_REPO so publish job doesn't need a git checkout by @waltsims in #29

Full Changelog: v1.4.1...v1.4.2