Skip to content

Bump BINARY_VERSION to v1.4.1 (sm_120 / Blackwell)#739

Merged
waltsims merged 6 commits into
masterfrom
release-v1.4.0-binaries
May 18, 2026
Merged

Bump BINARY_VERSION to v1.4.1 (sm_120 / Blackwell)#739
waltsims merged 6 commits into
masterfrom
release-v1.4.0-binaries

Conversation

@waltsims
Copy link
Copy Markdown
Owner

@waltsims waltsims commented May 17, 2026

Summary

  • Bumps BINARY_VERSION to v1.4.1 — the rebuilt static-linked Linux binaries + fast-math-fixed darwin binary published from kspacefirstorder-unified#16.
  • Bumps __version__ to 0.6.2.
  • Keeps WINDOWS_OMP_VERSION="v1.3.0" pin — the new v1.4.1 OMP-windows release ships its own DLL bundle but the consumer flip is a discrete v0.6.3 one-liner once we've validated the bundle in production.

Picked up via this version bump

Validation evidence

  • Colab T4 (Ubuntu 22.04, glibc 2.35, CUDA 13.0): CUDA backend OK shape: (302,), max abs: 0.067452
  • Local Linux (this dev box): full Python → C++/OMP → Python sim, same numerical result
  • M1 Mac: clean venv install, otool -L confirms libhdf5.320.dylib linkage
  • aconesac RTX 5060: --version + 3D simulation, CUDA code arch: 12.0

Outstanding

  • Awaiting release-on-tag.yml publish of v1.4.1 to the mirror repos before final CI run on this PR — until then the urlretrieve in import kwave 404s against the not-yet-published assets.
  • v1.4.0 releases on all 4 mirrors marked REVOKED (see v1.4.1), flipped to prerelease, with explanation in the release notes.

🤖 Generated with Claude Code

Greptile Summary

This PR bumps BINARY_VERSION to v1.4.1 (adding NVIDIA Blackwell sm_120 support and a macOS HDF5 ABI refresh) and increments __version__ to 0.6.2. Windows binaries are intentionally held at v1.3.0 via new WINDOWS_OMP_VERSION/WINDOWS_CUDA_VERSION pins while runtime DLL bundling for the new compiler stack is validated.

  • Intel Mac guard_darwin_unsupported is set when platform.machine() != \"arm64\" on macOS; a RuntimeWarning is emitted and the darwin OMP download URL is replaced with an empty list, preventing a silent exec format error at binary invocation.
  • Linux CUDA URL — previously hard-coded to v1.3.1, now tracks BINARY_VERSION so both Linux CUDA and OMP move together.
  • Roadmap updated to renumber milestones and add a v0.6.5 entry for universal2 (Intel + Apple Silicon) darwin binary coverage.

Confidence Score: 5/5

Safe to merge — both changed files are narrow and the logic is correct.

Both files are narrow: init.py touches only constants and the darwin download guard, and release-strategy.md is documentation. The Windows pin separation avoids regressing existing users, the Linux CUDA URL correctly moves from a stale v1.3.1 hardcode to BINARY_VERSION, and the arm64 check prevents a silent crash on Intel Macs. No logic paths are deleted or restructured.

No files require special attention.

Important Files Changed

Filename Overview
kwave/init.py Bumps version to 0.6.2 and BINARY_VERSION to v1.4.1; pins Windows to v1.3.0 via separate WINDOWS_OMP_VERSION/WINDOWS_CUDA_VERSION constants; adds Intel Mac (x86_64 darwin) RuntimeWarning and skips OMP download on that arch; fixes Linux CUDA URL which was previously hard-coded to v1.3.1.
plans/release-strategy.md Roadmap updated: v0.6.2 is now the binary refresh, previous tier-2 features milestone shifted to v0.6.3, axisymmetric to v0.6.4, and a new v0.6.5 entry added for universal2 darwin coverage.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[import kwave] --> B{PLATFORM}
    B -- linux --> C[BINARY_VERSION v1.4.1]
    B -- darwin --> D{platform.machine arm64?}
    B -- windows --> E[WINDOWS_OMP v1.3.0 / WINDOWS_CUDA v1.3.0]
    B -- other --> FAIL[raise NotImplementedError]
    D -- yes --> G[darwin omp URL set to v1.4.1 release]
    D -- no --> H[Emit RuntimeWarning / darwin omp URL is empty list]
    C --> I[Linux OMP + CUDA URLs both use v1.4.1]
    E --> J[Windows OMP exe + DLLs from v1.3.0 / CUDA exe from v1.3.0]
    I --> K{binaries_present?}
    G --> K
    H --> K2[binaries_present returns True / no download]
    J --> K
    K -- yes --> DONE[import complete]
    K -- no --> L[install_binaries / download + chmod + write metadata]
    L --> DONE
    K2 --> DONE
Loading

Reviews (7): Last reviewed commit: "Update release strategy with v0.6.2 retr..." | Re-trigger Greptile

Consolidates the per-platform pins (v1.3.0, v1.3.1, v0.3.0rc3) into a
single BINARY_VERSION used by all five mirror URLs. Picks up:

- CUDA: sm_75;80;86;87;89;90;90a;100;120 (closes #656, #622 once
  verified on Blackwell hardware)
- macOS OMP: linked against libhdf5.320 — current Homebrew ABI
  (likely closes #661 pending current-Homebrew smoke test)

Also slots v0.6.2 (this release) into plans/release-strategy.md and
bumps the downstream version slots, plus adds a v0.6.5 entry for the
Intel Mac universal2 follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread kwave/__init__.py
The v1.4.0 k-wave-omp-darwin binary is Mach-O arm64 only. Without a
guard, Intel Mac users would silently download a binary they can't
execute and hit "exec format error" at runtime.

On Intel Mac: emit a RuntimeWarning at import explaining the constraint,
and skip the darwin/omp URL so we don't waste bandwidth downloading a
useless binary. backend="python" continues to work.

Universal2 (arm64+x86_64) coverage is tracked for v0.6.5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waltsims
Copy link
Copy Markdown
Owner Author

Addressed both blockers found in CI:

  1. Greptile P1 (silent Intel Mac breakage): Added a platform.machine() guard in kwave/__init__.py (6306d87). On darwin x86_64: emit a RuntimeWarning at import explaining the constraint, and skip the darwin/omp URL so we don't download a useless binary. backend="python" still works. Universal2 follow-up tracked at v0.6.5.

  2. Windows test 404s (HTTPError on DLL downloads): The v1.4.0 kspaceFirstOrder-OMP-windows release shipped only .exe, missing the 10 runtime DLLs (cufft64_10.dll, hdf5.dll, etc.) that WINDOWS_DLLS requires. Copied them over from v1.3.0 as an interim fix. Proper fix (package DLLs in the unified CI build, or switch to static linking) tracked at Windows OMP build doesn't package required DLLs kspacefirstorder-unified#14.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 17, 2026

Codecov Report

❌ Patch coverage is 83.33333% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.55%. Comparing base (2b1388d) to head (92fd493).
⚠️ Report is 7 commits behind head on master.

Files with missing lines Patch % Lines
kwave/__init__.py 83.33% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #739      +/-   ##
==========================================
- Coverage   75.63%   75.55%   -0.08%     
==========================================
  Files          57       57              
  Lines        8180     8187       +7     
  Branches     1597     1598       +1     
==========================================
- Hits         6187     6186       -1     
- Misses       1373     1380       +7     
- Partials      620      621       +1     
Flag Coverage Δ
3.10 75.52% <83.33%> (-0.08%) ⬇️
3.11 75.52% <83.33%> (-0.08%) ⬇️
3.12 75.52% <83.33%> (-0.08%) ⬇️
3.13 75.52% <83.33%> (-0.08%) ⬇️
macos-latest 75.46% <83.33%> (-0.01%) ⬇️
ubuntu-latest 75.46% <83.33%> (-0.06%) ⬇️
windows-latest 75.30% <83.33%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The v1.4.0 OMP-windows build switched compiler/OpenMP/FFT stack (Intel
compiler + Intel OpenMP + ? → MSVC + VCOMP + FFTW) and needs runtime
DLLs that aren't packaged with the release:

  needed but not shipped: fftw3f.dll, VCOMP140.DLL, VCRUNTIME140_1.dll
  shipped but unneeded:   cufft64_10.dll, libiomp5md.dll, libmmd.dll,
                          svml_dispmd.dll

Windows OMP doesn't benefit from the v1.4.0 Blackwell changes anyway
(CUDA-only), so route it back to the working v1.3.0 binary until the
build is fixed in kspacefirstorder-unified#14. CUDA-windows, all linux
binaries, and darwin OMP continue to use v1.4.0.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@waltsims
Copy link
Copy Markdown
Owner Author

@greptileai review

@aconesac
Copy link
Copy Markdown
Contributor

Verified the v1.4.0 CUDA binary on an NVIDIA GeForce RTX 5060 Laptop GPU (Blackwell, sm_120, CUDA 13.0) on Linux.

Downloaded the binary directly from the kspaceFirstOrder-CUDA-linux v1.4.0 release and ran --version:

┌───────────────────────────────────────────────────────────────┐
│                  kspaceFirstOrder-CUDA v1.3                   │
├───────────────────────────────────────────────────────────────┤
│ Selected GPU device id:                                     0 │
│ GPU device name:           NVIDIA GeForce RTX 5060 Laptop GPU │
├───────────────────────────────────────────────────────────────┤
│ CUDA runtime:     13.0                                        │
│ CUDA driver:      13.0                                        │
│ CUDA code arch:   12.0                                        │
├───────────────────────────────────────────────────────────────┤
│ CUDA device id:   0                                           │
│ CUDA device name: NVIDIA GeForce RTX 5060 Laptop GPU          │
│ CUDA capability:  12.0                                        │
└───────────────────────────────────────────────────────────────┘

Also ran a full 3D simulation (64x64x64, heterogeneous medium) via kspaceFirstOrder3D:

┌───────────────────────────────────────────────────────────────┐
│                  kspaceFirstOrder-CUDA v1.3                   │
├───────────────────────────────────────────────────────────────┤
│ Reading simulation configuration:                        Done │
│ Selected GPU device id:                                     0 │
│ GPU device name:           NVIDIA GeForce RTX 5060 Laptop GPU │
├───────────────────────────────────────────────────────────────┤
│ Domain dimensions:                               64 x 64 x 64 │
│ Simulation time steps:                                    444 │
├───────────────────────────────────────────────────────────────┤
│ Total execution time:                                   3.14s │
└───────────────────────────────────────────────────────────────┘
✅ Simulation completed!
p_final shape: (64, 64, 64)

GPU detected correctly, CUDA code arch: 12.0 confirmed, full simulation runs without errors. ✅

Will also run the verification on the RTX 5070 Ti (the original hardware from #656) once available.

@waltsims waltsims changed the title Bump BINARY_VERSION to v1.4.0 (sm_120 / Blackwell) Bump BINARY_VERSION to v1.4.1 (sm_120 / Blackwell) May 17, 2026
waltsims added 2 commits May 18, 2026 00:07
Linux + macOS pull from the rebuilt v1.4.1 mirror releases:
- Linux binaries are now statically linked (CUDA + cufft + FFTW +
  libstdc++) and built on ubuntu-22.04 (glibc 2.35 floor). Restores
  the plug-and-play property the legacy Makefile build provided;
  permanent regression guard via check-linux-binary-deps.sh in unified.
- Darwin binary picks up the fast-math fix (k-wave-omp-darwin#4) and
  the libhdf5.320 ABI refresh (closes #661).

Windows stays pinned to v1.3.0 for both OMP and CUDA. v1.4.x windows
releases don't bundle their runtime DLLs (different stacks for both
flavors). The v1.4.x OMP DLL bundling is fixed in
kspacefirstorder-unified#14 (awaiting production validation); CUDA
DLL bundling is tracked separately in kspacefirstorder-unified#17.
Pin gets flipped in v0.6.3 once both windows flavors are validated.

Picks up via this bump:
- #656, #622 (Blackwell sm_120 on Linux + macOS)
- #661 (macOS HDF5 ABI)
- kspacefirstorder-unified#15 (Linux binary regression)

Closes #738
- v0.6.2 section rewritten as retrospective: v1.4.1 (not v1.4.0),
  Linux static linking, darwin fast-math fix, Windows pinned to v1.3.0
- v0.6.3 picks up the Windows pin flips (OMP validation + CUDA DLL
  packaging fix per unified#17) as carry-over items
- New "Binary distribution maintenance" section captures the
  version-less pipeline work: mirror consolidation (unified#13) +
  Windows static linking (v1.5 follow-up)
@waltsims waltsims merged commit 0ba7876 into master May 18, 2026
31 checks passed
@waltsims waltsims deleted the release-v1.4.0-binaries branch May 18, 2026 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants