diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index e310d53..38768d8 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -22,6 +22,9 @@ jobs: git config --global url."https://github.com/".insteadOf "git@github.com:" git submodule update --init --recursive + - name: Lint test registration + run: ./tools/check_test_registration.sh + - name: Build SoftFloat run: make -C SoftFloat/build/Linux-x86_64-GCC diff --git a/HANDOFF.md b/HANDOFF.md new file mode 100644 index 0000000..f80c386 --- /dev/null +++ b/HANDOFF.md @@ -0,0 +1,164 @@ +# SoftBLAS — Handoff to urbit/numerics agent + +_Scratch document — not committed. Written 2026-05-30 after a five-reviewer audit +(Gnome/Dwarf/Elf/Hobbit/Angel) of `urbit/SoftBLAS` and the follow-up fixes._ + +## TL;DR + +A correctness audit found three silently-wrong-answer bugs, a process-killing +error path, and an inert rounding-mode API. The first four are **fixed and merged +to `master`**; CI now runs on every PR. The rounding-mode work lives in **PR #1**, +which has been completed and reconciled with `master` but is **a breaking API +change** that requires a coordinated Lagoon update in Vere. + +Your job picks up at: (1) finish the Lagoon side of the `rndMode` change, (2) bump +the pinned SoftBLAS commit in Vere, (3) work the remaining audit backlog below. + +## What shipped this session + +| PR | State | Summary | +|----|-------|---------| +| **#8** | merged | `f128_abs` masked both float128 words (corrupted negative quads); `*gemm` indexed B with `ldc` instead of `ldb`; `*gemv` folded `incX/incY` into the matrix index. All three produced silently wrong results, none caught by the old tests. + regression tests. | +| **#9** | merged | `exit(-1)` on invalid args (bad transpose/layout; `swap` N==0) → replaced with `return`. `swap` N==0 is now a no-op per BLAS spec. (Vere is the only client; it validates and bails via `u3m_bail`, so a plain `return` suffices — no error-callback machinery.) | +| **#10** | merged | GitHub Actions CI: builds SoftFloat + SoftBLAS, runs the munit suite on every push/PR. | +| **#1** | open, **mergeable**, **CI green** | Rounding-mode redesign — see below. Branch `sigilante/rounding-mode`. Merged up to `master` so it no longer reintroduces the #8 bugs. 140/140 tests pass (locally and in CI). | + +`master` is green (139 tests). The bugs were invisible before because every prior +test used unit strides and `ldb == ldc`. + +## PR #1 (`sigilante/rounding-mode`) — what it is and what's left + +**Design:** rounding is now **per-call**. Every arithmetic routine gained a +trailing `const uint_fast8_t rndMode` parameter and calls `_set_rounding(rndMode)` +on entry (sets SoftFloat's `softfloat_roundingMode`). The old global +`softblas_roundingMode` + `softblas_state.c` are **removed**. This supersedes the +audit's "problem #5" (the old `set_rounding()` was never called, so the documented +rounding control silently did nothing). + +`_set_rounding(a)` accepts `'n' 'z' 'u' 'd' 'a'`; an **unknown mode is a no-op** +(leaves SoftFloat's current mode unchanged) rather than `exit(1)`. + +**Completed in this handoff's work (pushed to the branch):** +- Converted all remaining implementations to match the header's `rndMode` + prototypes; restored the `'a'` (round-away) mode; made unknown-mode a no-op. +- Updated every test call site; added `test_saxpy_rounding`, which proves the mode + actually changes results (`1.0 + 2⁻²⁴` → `1.0` under `'n'`, next-float-up under + `'u'`). Would fail if `_set_rounding` were inert. +- Merged `origin/master` in: #8's fixes + `rndMode` now coexist (verified: + `sgemm` has `ldb` + `rndMode`; `sgemv` has `A[i*lda+j]` + `rndMode`; `f128_abs` + fixed). 140/140 pass. + +**⚠️ This is a breaking API change.** Every routine signature gained a trailing +arg. All downstream callers must pass a mode. + +**Deliberately NOT given `rndMode`** (they do no rounding / header omits it): +`isamax/idamax/ihamax/iqamax` (index-of-max = abs + compare), and `sdsdot/hsdot` +(their prototypes don't take it). Leave these alone. + +**Converted but NOT built** (kept header-consistent for later): the complex +(`caxpy/ccopy/cdotc/csrot/cscal/scasum/scnrm2`) and rotation +(`*rot/*rotg/*rotm`) routines now carry `rndMode`, but they are **not** in the +Makefile and several still have pre-existing compile bugs (see backlog). `*rotmg` +were left unconverted (multi-line signatures; also unbuilt). + +## Immediate action items (Vere / Lagoon side) + +1. **Lagoon `rndMode` update — REQUIRED, do in tandem with merging PR #1.** + `vere/pkg/noun/jets/i/lagoon.c` has **~32 BLAS call sites** (`saxpy/sdot/sscal/ + sgemm` and h/d/q variants) that call the *old* signatures. Each needs the + trailing `rndMode` argument. Lagoon already computes the mode — see + `_set_rounding(c3_w a)` at ~`lagoon.c:40`, which maps `%n/%z/%u/%d/%a`. Thread + that same char (the jet's rounding-mode argument) into each BLAS call. The + moment PR #1 merges, Vere won't compile until this is done. + - Note: `i*amax` calls take **no** `rndMode` — don't add it there. + - Lagoon currently sets `softfloat_roundingMode` directly (and the now-removed + `softblas_roundingMode`); after this change, drop the + `softblas_roundingMode = ...` writes (that global no longer exists) and rely + on passing `rndMode` per call. + +2. **Bump the pinned SoftBLAS commit in Vere.** + `vere/ext/softblas/build.zig.zon:10` pins `cbffb33f…` (an old commit that has + *all three* #8 bugs). After merges land, bump `.url` + `.hash` to a `master` + commit that includes #8/#9/#10 (and PR #1 once merged). Also check + `vere/ext/softblas/build.zig`'s source list still matches SoftBLAS's + `BLAS_SRCS` (no `softblas_state.c` anymore). + +3. **Merge PR #1** once Lagoon is ready. It's mergeable; "UNSTABLE" only means CI + isn't wired on that branch yet (CI reached `master` via #10 after PR #1 forked; + the merge commit already pulled `master` in, so a re-merge of latest `master` + would attach CI). + +## Remaining SoftBLAS backlog (from the audit, not yet done) + +A fuller list was kept in `WORKLIST.md` (was stashed under +`stash@{0}` on the `ci/github-actions` branch: `git stash show -p stash@{0}`). +Highlights, roughly by severity: + +**Correctness / determinism** +- `qaxpy`: `nan_unify_q(QY)` only canonicalizes `QY[0]`; NaNs in `QY[1..]` escape → + breaks determinism for vectors length > 1. Unify per written element. +- `*nrm2`: loop bound `(N-1)*incX` can integer-overflow → early exit / infinite loop. +- `FNEG` macro (`1 << 31` / `1 << 63`): shifts a signed int into the sign bit = UB. + Affects complex conjugation. Use a width-correct unsigned literal. +- `*swap`/`csrot`: declare `incX` as `uint64_t` then test `if (incX < 0)` (always + false) → negative strides become huge positives → OOB. (May differ post-PR#1.) +- `iamax`: 0-based for N>1 but hardcodes `return 1` for N==1 — inconsistent; document + the convention (Fortran BLAS is 1-based; this is effectively 0-based). +- `f128M_abs`: now masks the right word, but is still a structurally broken + `(float128_t*){…}` compound literal (currently unused). + +**Resource / robustness** +- `qasum` leaks 16 bytes/call (heap accumulator never freed; use a stack temp). +- `svec/dvec/hvec/qvec` don't null-check `malloc`. + +**Broken / orphaned code presented as "complete"** +- `rot/rotg/rotm/rotmg` (all precisions) are declared in the header but **not built** + → link errors if a caller uses them. Either build+fix or document as unbuilt. +- `sdsdot.c`, `hsdot.c`, `scnrm2.c` don't compile (undefined idents / complex type + errors). `qrot` prototype uses `float16_t*` (copy-paste). `*rotmg` compute + constants via integer multiply of hex bit-patterns (meaningless). `srot/drot/hrot` + compare SoftFloat structs to int literals with `!=` (UB) and omit + `#include "softblas.h"`. + +**Test-harness bugs (in `tests/test_all.c`)** +- `*swap_stride` slots run `*swap_two` (the stride variant is never exercised). +- `idamax/ihamax/iqamax` stride+13542 slots point at the **`isamax`** functions — + the d/h/q index-max tests don't test their own routine. + +**Hygiene / scope** +- `v[0]=lo, v[1]=hi` float128 layout invariant should be documented in the header + (currently an `// XXX` note in `qdot.c`). +- Consider an X-macro for the 6 genuinely-identical Level-1 routines + (axpy/copy/dot/scal/swap/asum) across `h/s/d`; keep `q` separate (pointer API). +- Move internal helpers (`*vec`, `nan_unify_*`) to an internal header. +- `benchmarking/` is dead (benchmarking was removed from planned work, commit + f4908d3) — candidate for deletion. Unused complex 16/128-bit (`i`/`v`) type stubs. +- Naming sovereignty is limited by downstream clients — coordinate renames. + +## Build & test + +- **CI / Linux (canonical):** submodules (SoftFloat is an SSH submodule — rewrite to + HTTPS), `make -C SoftFloat/build/Linux-x86_64-GCC`, then `make tests && ./test_all`. + See `.github/workflows/ci.yml`. +- **macOS gotcha:** the Makefile hardcodes `gcc`, `-LSoftFloat/build/Linux-x86_64-GCC`, + and `-l:softfloat.a` (GNU-ld only) — it does **not** build out-of-the-box on + Apple `ld`. During this work I built locally with `clang`, linking the checked-in + (arm64) `softfloat.a` directly and compiling the `BLAS_SRCS` list by hand. A + Darwin build path is an open backlog item. + +## Landmines / things that bit me + +- The whole reason the bugs survived: tests only used unit strides and `ldb==ldc`. + Any new routine work should add non-unit-stride / `ldb≠ldc` coverage. +- PR #1's source merged cleanly with #8 because the changes were on different lines + (signature/`_set_rounding` at top vs. index fixes in the loop body) — but always + re-verify `f128_abs`, `*gemm` `ldb`, and `*gemv` indexing after any future merge. +- When adding `rndMode` to test calls, watch for calls with trailing comments + (`sswap(0,…); // …`) — a naive "ends with `);`" filter misses them. +- Merging `master` into PR #1 silently re-took master's **Makefile**, which lists + `softblas_state.c` (deleted here) and the WIP complex routines (`scasum`/`ccopy` + don't compile) — this broke CI. Resolved by restoring the branch's `BLAS_SRCS` + (40 REAL routines) and re-deleting `softblas_state.c`. On any future `master` + merge, re-check the Makefile didn't resurrect either. The branch's complex/rot + files carry `rndMode` but are intentionally unbuilt and still WIP-broken + (e.g. `ccopy` declares `ix/iy` but uses `iX/iY`) — fix when completing complex. diff --git a/README.md b/README.md index 4ca66ac..ed1f7ad 100644 --- a/README.md +++ b/README.md @@ -12,10 +12,11 @@ Following SoftFloat 3e and requiring a 64-bit OS, all quantities are passed by v - 1.0.0 (commit `7d05697aea5363dcf5f877a9c8b464e9c352d3d4`). Basic suite of `REAL`-valued operations suitable for use with half, single, double, and quadruple precision floating-point numbers. - 1.1.0 (commit `f94acbcfd26cebd8e135ad9e8c7caa156fcc4ac9`). Errors changed to `exit(-1)` instead of `return`. +- Unreleased. Invalid-argument errors `return` (no-op) again instead of `exit(-1)`, so the library never terminates the host process. The rounding mode is now a per-call argument (see below) rather than a global. ### Planned work -- [ ] Add rounding-mode propagation to function signatures (in progress). +- [x] Add rounding-mode propagation to function signatures. - [ ] Complete complex-valued functions (in progress). - [ ] Use a linter. - [ ] Add (kelvin) versioning, at least on interface. @@ -31,13 +32,13 @@ Following SoftFloat 3e and requiring a 64-bit OS, all quantities are passed by v | 64 | `d` | `float64_t` | 64 | `z` | `complex64_t` | | 128 | `q` | `float128_t` | 128 | `v` | `complex128_t` | -The rounding mode should be set via the global variable `softblas_roundingMode` (a `char` `typedef`), defined in `softblas.h`. Valid values are: +Every routine takes a trailing rounding-mode argument `rndMode` (a `uint_fast8_t` holding one of the `char` codes below). It is applied via `_set_rounding` at routine entry, before any arithmetic. An unrecognized code leaves SoftFloat's current mode unchanged. Valid values are: -- `'n'` - round to nearest (even) +- `'n'` - round to nearest, ties to even - `'z'` - round towards zero -- `'u'` - round up -- `'d'` - round down -- `'a'` - round away from zero +- `'u'` - round up (towards +∞) +- `'d'` - round down (towards −∞) +- `'a'` - round to nearest, ties away from zero Per Wikipedia: diff --git a/include/softblas.h b/include/softblas.h index f5e610d..3b98f35 100644 --- a/include/softblas.h +++ b/include/softblas.h @@ -168,11 +168,16 @@ typedef struct { #define c64_div(a, b) (complex64_t){ f64_div( f64_add( f64_mul(a.real, b.real), f64_mul(a.imag, b.imag) ), f64_add( f64_mul(b.real, b.real), f64_mul(b.imag, b.imag) ) ), f64_div( f64_sub( f64_mul(a.imag, b.real), f64_mul(a.real, b.imag) ), f64_add( f64_mul(b.real, b.real), f64_mul(b.imag, b.imag) ) ) } #define c128_div(a, b) (complex128_t){ f128_div( f128_add( f128_mul(a.real, b.real), f128_mul(a.imag, b.imag) ), f128_add( f128_mul(b.real, b.real), f128_mul(b.imag, b.imag) ) ), f128_div( f128_sub( f128_mul(a.imag, b.real), f128_mul(a.real, b.imag) ), f128_add( f128_mul(b.real, b.real), f128_mul(b.imag, b.imag) ) ) } -#define c16_conj(a) (complex16_t){ a.real, FNEG(a.imag) } -#define c32_conj(a) (complex32_t){ a.real, FNEG(a.imag) } -#define c64_conj(a) (complex64_t){ a.real, FNEG(a.imag) } -#define c128_conj(a) (complex128_t){ a.real, FNEG(a.imag) } -#define FNEG(x) ( x.v ^ (1 << (sizeof(x.v) * 8 - 1)) ) +// FNEG flips the sign bit of a scalar-valued SoftFloat type (float16/32/64). +// The mask must be unsigned and at least as wide as x.v: a plain `1 << 31` +// is undefined (signed int overflow) and `1 << 63` does not fit in an int. +#define FNEG(x) ( (x).v ^ ((uint64_t)1 << (sizeof((x).v) * 8 - 1)) ) +#define c16_conj(a) (complex16_t){ (a).real, FNEG((a).imag) } +#define c32_conj(a) (complex32_t){ (a).real, FNEG((a).imag) } +#define c64_conj(a) (complex64_t){ (a).real, FNEG((a).imag) } +// float128_t holds its sign bit in the high word v[1]; v is an array, so it +// cannot use FNEG (which xors the whole scalar). Flip v[1]'s sign bit only. +#define c128_conj(a) (complex128_t){ (a).real, (float128_t){ (a).imag.v[0], (a).imag.v[1] ^ ((uint64_t)1 << 63) } } // SOFTBLAS PROTOTYPES @@ -188,7 +193,7 @@ void srotg(float32_t *a, float32_t *b, float32_t *c, float32_t *s, const uint_fa void srotm(const uint64_t N, float32_t *X, const uint64_t incX, float32_t *Y, const uint64_t incY, const float32_t *P, const uint_fast8_t rndMode); void srotmg(float32_t *D1, float32_t *D2, float32_t *X1, const float32_t y1, float32_t *P, const uint_fast8_t rndMode); void sscal(uint64_t N, float32_t SA, float32_t *SX, uint64_t incX, const uint_fast8_t rndMode); -void sswap(uint64_t N, float32_t *SX, uint64_t incX, float32_t *SY, uint64_t incY, const uint_fast8_t rndMode); +void sswap(uint64_t N, float32_t *SX, int64_t incX, float32_t *SY, int64_t incY, const uint_fast8_t rndMode); uint64_t isamax(uint64_t N, const float32_t *SX, uint64_t incX); // Double-precision @@ -202,7 +207,7 @@ void drotg(float64_t *a, float64_t *b, float64_t *c, float64_t *s, const uint_fa void drotm(const uint64_t N, float64_t *X, const uint64_t incX, float64_t *Y, const uint64_t incY, const float64_t *P, const uint_fast8_t rndMode); void drotmg(float64_t *D1, float64_t *D2, float64_t *X1, const float64_t y1, float64_t *P, const uint_fast8_t rndMode); void dscal(uint64_t N, float64_t DA, float64_t *DX, uint64_t incX, const uint_fast8_t rndMode); -void dswap(uint64_t N, float64_t *DX, uint64_t incX, float64_t *DY, uint64_t incY, const uint_fast8_t rndMode); +void dswap(uint64_t N, float64_t *DX, int64_t incX, float64_t *DY, int64_t incY, const uint_fast8_t rndMode); uint64_t idamax(uint64_t N, const float64_t *DX, uint64_t incX); // Half-precision @@ -216,7 +221,7 @@ void hrotg(float16_t *a, float16_t *b, float16_t *c, float16_t *s, const uint_fa void hrotm(const uint64_t N, float16_t *X, const uint64_t incX, float16_t *Y, const uint64_t incY, const float16_t *P, const uint_fast8_t rndMode); void hrotmg(float16_t *D1, float16_t *D2, float16_t *X1, const float16_t y1, float16_t *P, const uint_fast8_t rndMode); void hscal(uint64_t N, float16_t HA, float16_t *HX, uint64_t incX, const uint_fast8_t rndMode); -void hswap(uint64_t N, float16_t *HX, uint64_t incX, float16_t *HY, uint64_t incY, const uint_fast8_t rndMode); +void hswap(uint64_t N, float16_t *HX, int64_t incX, float16_t *HY, int64_t incY, const uint_fast8_t rndMode); uint64_t ihamax(uint64_t N, const float16_t *HX, uint64_t incX); // Quad-precision @@ -230,7 +235,7 @@ void qrotg(float128_t *a, float128_t *b, float128_t *c, float128_t *s, const uin void qrotm(const uint64_t N, float128_t *X, const uint64_t incX, float128_t *Y, const uint64_t incY, const float128_t *P, const uint_fast8_t rndMode); void qrotmg(float128_t *D1, float128_t *D2, float128_t *X1, const float128_t y1, float128_t *P, const uint_fast8_t rndMode); void qscal(uint64_t N, float128_t QA, float128_t *QX, uint64_t incX, const uint_fast8_t rndMode); -void qswap(uint64_t N, float128_t *QX, uint64_t incX, float128_t *QY, uint64_t incY, const uint_fast8_t rndMode); +void qswap(uint64_t N, float128_t *QX, int64_t incX, float128_t *QY, int64_t incY, const uint_fast8_t rndMode); uint64_t iqamax(uint64_t N, const float128_t *QX, uint64_t incX); // Complex single-precision @@ -239,7 +244,7 @@ void caxpy(uint64_t N, complex32_t CA, complex32_t *CX, int64_t incX, complex32_ void ccopy(uint64_t N, const complex32_t *CX, int64_t incX, complex32_t *CY, int64_t incY, const uint_fast8_t rndMode); complex32_t cdotc(uint64_t N, const complex32_t *CX, int64_t incX, const complex32_t *CY, int64_t incY, const uint_fast8_t rndMode); float32_t scnrm2(uint64_t N, const complex32_t *CX, uint64_t incX, const uint_fast8_t rndMode); -void csrot(const uint64_t N, complex32_t *CX, const uint64_t incX, complex32_t *CY, const uint64_t incY, const complex32_t c, const complex32_t s, const uint_fast8_t rndMode); +void csrot(const uint64_t N, complex32_t *CX, const int64_t incX, complex32_t *CY, const int64_t incY, const complex32_t c, const complex32_t s, const uint_fast8_t rndMode); void cscal(uint64_t N, complex32_t CA, complex32_t *CX, uint64_t incX, const uint_fast8_t rndMode); void cswap(uint64_t N, complex32_t *CX, uint64_t incX, complex32_t *CY, uint64_t incY, const uint_fast8_t rndMode); diff --git a/src/blas/level1/csrot.c b/src/blas/level1/csrot.c index b230a37..58d3f04 100644 --- a/src/blas/level1/csrot.c +++ b/src/blas/level1/csrot.c @@ -1,6 +1,6 @@ #include "softblas.h" -void csrot(const uint64_t N, complex32_t *CX, const uint64_t incX, complex32_t *CY, const uint64_t incY, const complex32_t c, const complex32_t s, const uint_fast8_t rndMode) { +void csrot(const uint64_t N, complex32_t *CX, const int64_t incX, complex32_t *CY, const int64_t incY, const complex32_t c, const complex32_t s, const uint_fast8_t rndMode) { _set_rounding(rndMode); complex32_t temp; int64_t ix = 0; diff --git a/src/blas/level1/dnrm2.c b/src/blas/level1/dnrm2.c index b33da78..0faa352 100644 --- a/src/blas/level1/dnrm2.c +++ b/src/blas/level1/dnrm2.c @@ -14,7 +14,7 @@ float64_t dnrm2(uint64_t N, const float64_t *X, uint64_t incX, const uint_fast8_ float64_t ssq = { SB_REAL64_ONE }; float64_t absXI; - for (uint64_t ix = 0; ix < 1 + (N - 1) * incX; ix += incX) { + for (uint64_t k = 0, ix = 0; k < N; k++, ix += incX) { if (f64_ne(X[ix], (float64_t){ SB_REAL64_ZERO })) { absXI = f64_abs(X[ix]); if (f64_lt(scale, absXI)) { diff --git a/src/blas/level1/dswap.c b/src/blas/level1/dswap.c index 4b7fba7..89f942b 100644 --- a/src/blas/level1/dswap.c +++ b/src/blas/level1/dswap.c @@ -1,13 +1,13 @@ #include "softblas.h" -void dswap(uint64_t N, float64_t *DX, uint64_t incX, float64_t *DY, uint64_t incY, const uint_fast8_t rndMode) { +void dswap(uint64_t N, float64_t *DX, int64_t incX, float64_t *DY, int64_t incY, const uint_fast8_t rndMode) { _set_rounding(rndMode); float64_t dtemp; if (N == 0) return; - uint64_t iX = 0; - uint64_t iY = 0; + int64_t iX = 0; + int64_t iY = 0; if (incX < 0) iX = (-N + 1) * incX; if (incY < 0) iY = (-N + 1) * incY; for (uint64_t i = 0; i < N; i++) { diff --git a/src/blas/level1/hnrm2.c b/src/blas/level1/hnrm2.c index 2c64121..e4290bf 100644 --- a/src/blas/level1/hnrm2.c +++ b/src/blas/level1/hnrm2.c @@ -14,7 +14,7 @@ float16_t hnrm2(uint64_t N, const float16_t *X, uint64_t incX, const uint_fast8_ float16_t ssq = { SB_REAL16_ONE }; float16_t absXI; - for (uint64_t ix = 0; ix < 1 + (N - 1) * incX; ix += incX) { + for (uint64_t k = 0, ix = 0; k < N; k++, ix += incX) { if (f16_ne(X[ix], (float16_t){ SB_REAL16_ZERO })) { absXI = f16_abs(X[ix]); if (f16_lt(scale, absXI)) { diff --git a/src/blas/level1/hswap.c b/src/blas/level1/hswap.c index 025ce71..baae923 100644 --- a/src/blas/level1/hswap.c +++ b/src/blas/level1/hswap.c @@ -1,13 +1,13 @@ #include "softblas.h" -void hswap(uint64_t N, float16_t *HX, uint64_t incX, float16_t *HY, uint64_t incY, const uint_fast8_t rndMode) { +void hswap(uint64_t N, float16_t *HX, int64_t incX, float16_t *HY, int64_t incY, const uint_fast8_t rndMode) { _set_rounding(rndMode); float16_t htemp; if (N == 0) return; - uint64_t iX = 0; - uint64_t iY = 0; + int64_t iX = 0; + int64_t iY = 0; if (incX < 0) iX = (-N + 1) * incX; if (incY < 0) iY = (-N + 1) * incY; for (uint64_t i = 0; i < N; i++) { diff --git a/src/blas/level1/idamax.c b/src/blas/level1/idamax.c index 5937e62..99378f9 100644 --- a/src/blas/level1/idamax.c +++ b/src/blas/level1/idamax.c @@ -2,7 +2,7 @@ uint64_t idamax(uint64_t N, const float64_t *DX, uint64_t incX) { if (N < 1 || incX <= 0) return 0; - if (N == 1) return 1; + if (N == 1) return 0; uint64_t idamax = 0; uint64_t ix = 0; diff --git a/src/blas/level1/ihamax.c b/src/blas/level1/ihamax.c index 453910f..43fda8a 100644 --- a/src/blas/level1/ihamax.c +++ b/src/blas/level1/ihamax.c @@ -2,7 +2,7 @@ uint64_t ihamax(uint64_t N, const float16_t *HX, uint64_t incX) { if (N < 1 || incX <= 0) return 0; - if (N == 1) return 1; + if (N == 1) return 0; uint64_t ihamax = 0; uint64_t ix = 0; diff --git a/src/blas/level1/iqamax.c b/src/blas/level1/iqamax.c index 307ed36..4bd84c0 100644 --- a/src/blas/level1/iqamax.c +++ b/src/blas/level1/iqamax.c @@ -2,7 +2,7 @@ uint64_t iqamax(uint64_t N, const float128_t *QX, uint64_t incX) { if (N < 1 || incX <= 0) return 0; - if (N == 1) return 1; + if (N == 1) return 0; uint64_t iqamax = 0; uint64_t ix = 0; diff --git a/src/blas/level1/isamax.c b/src/blas/level1/isamax.c index 70a3af5..316eee1 100644 --- a/src/blas/level1/isamax.c +++ b/src/blas/level1/isamax.c @@ -2,7 +2,7 @@ uint64_t isamax(uint64_t N, const float32_t *SX, uint64_t incX) { if (N < 1 || incX <= 0) return 0; - if (N == 1) return 1; + if (N == 1) return 0; uint64_t isamax = 0; uint64_t ix = 0; diff --git a/src/blas/level1/qaxpy.c b/src/blas/level1/qaxpy.c index 67f4d75..614f541 100644 --- a/src/blas/level1/qaxpy.c +++ b/src/blas/level1/qaxpy.c @@ -11,9 +11,9 @@ void qaxpy(uint64_t N, float128_t QA, float128_t *QX, int64_t incX, float128_t * f128M_mul(&QA, &(QX[iX]), &QT); f128M_add(&(QY[iY]), &QT, &QT); QY[iY] = QT; + nan_unify_q(&(QY[iY])); iX += incX; iY += incY; } - nan_unify_q(QY); } diff --git a/src/blas/level1/qnrm2.c b/src/blas/level1/qnrm2.c index b4a1139..2771f75 100644 --- a/src/blas/level1/qnrm2.c +++ b/src/blas/level1/qnrm2.c @@ -17,7 +17,7 @@ float128_t qnrm2(uint64_t N, const float128_t *X, uint64_t incX, const uint_fast const float128_t QZERO = { SB_REAL128L_ZERO, SB_REAL128U_ZERO }; const float128_t QONE = { SB_REAL128L_ONE, SB_REAL128U_ONE }; - for (uint64_t ix = 0; ix < 1 + (N - 1) * incX; ix += incX) { + for (uint64_t k = 0, ix = 0; k < N; k++, ix += incX) { if (f128M_ne(&(X[ix]), &QZERO)) { absXI = f128_abs(X[ix]); if (f128M_lt(&scale, &absXI)) { diff --git a/src/blas/level1/qswap.c b/src/blas/level1/qswap.c index 694f23b..882be21 100644 --- a/src/blas/level1/qswap.c +++ b/src/blas/level1/qswap.c @@ -1,13 +1,13 @@ #include "softblas.h" -void qswap(uint64_t N, float128_t *QX, uint64_t incX, float128_t *QY, uint64_t incY, const uint_fast8_t rndMode) { +void qswap(uint64_t N, float128_t *QX, int64_t incX, float128_t *QY, int64_t incY, const uint_fast8_t rndMode) { _set_rounding(rndMode); float128_t qtemp; if (N == 0) return; - uint64_t iX = 0; - uint64_t iY = 0; + int64_t iX = 0; + int64_t iY = 0; if (incX < 0) iX = (-N + 1) * incX; if (incY < 0) iY = (-N + 1) * incY; for (uint64_t i = 0; i < N; i++) { diff --git a/src/blas/level1/snrm2.c b/src/blas/level1/snrm2.c index 8731bb9..795ab47 100644 --- a/src/blas/level1/snrm2.c +++ b/src/blas/level1/snrm2.c @@ -14,7 +14,7 @@ float32_t snrm2(uint64_t N, const float32_t *X, uint64_t incX, const uint_fast8_ float32_t ssq = { SB_REAL32_ONE }; float32_t absXI; - for (uint64_t ix = 0; ix < 1 + (N - 1) * incX; ix += incX) { + for (uint64_t k = 0, ix = 0; k < N; k++, ix += incX) { if (f32_ne(X[ix], (float32_t){ SB_REAL32_ZERO })) { absXI = f32_abs(X[ix]); if (f32_lt(scale, absXI)) { diff --git a/src/blas/level1/sswap.c b/src/blas/level1/sswap.c index cee1e83..8c5eb5f 100644 --- a/src/blas/level1/sswap.c +++ b/src/blas/level1/sswap.c @@ -1,13 +1,13 @@ #include "softblas.h" -void sswap(uint64_t N, float32_t *SX, uint64_t incX, float32_t *SY, uint64_t incY, const uint_fast8_t rndMode) { +void sswap(uint64_t N, float32_t *SX, int64_t incX, float32_t *SY, int64_t incY, const uint_fast8_t rndMode) { _set_rounding(rndMode); float32_t stemp; if (N == 0) return; - uint64_t iX = 0; - uint64_t iY = 0; + int64_t iX = 0; + int64_t iY = 0; if (incX < 0) iX = (-N + 1) * incX; if (incY < 0) iY = (-N + 1) * incY; for (uint64_t i = 0; i < N; i++) { diff --git a/tests/blas/include/test.h b/tests/blas/include/test.h index cc8bccb..ca2999b 100644 --- a/tests/blas/include/test.h +++ b/tests/blas/include/test.h @@ -54,6 +54,8 @@ MunitResult test_sswap_stride(const MunitParameter params[], void* user_data_or_fixture); MunitResult test_sswap_zero(const MunitParameter params[], void* user_data_or_fixture); +MunitResult test_sswap_negstride(const MunitParameter params[], + void* user_data_or_fixture); MunitResult test_isamax_0(const MunitParameter params[], void* user_data_or_fixture); MunitResult test_isamax_12345(const MunitParameter params[], @@ -62,6 +64,8 @@ MunitResult test_isamax_stride(const MunitParameter params[], void* user_data_or_fixture); MunitResult test_isamax_13542(const MunitParameter params[], void* user_data_or_fixture); +MunitResult test_isamax_one(const MunitParameter params[], + void* user_data_or_fixture); MunitResult test_dasum_0(const MunitParameter params[], void* user_data_or_fixture); @@ -181,6 +185,8 @@ MunitResult test_qaxpy_stride(const MunitParameter params[], void* user_data_or_fixture); MunitResult test_qaxpy_neg_stride(const MunitParameter params[], void* user_data_or_fixture); +MunitResult test_qaxpy_nan_unify(const MunitParameter params[], + void* user_data_or_fixture); MunitResult test_qcopy_all(const MunitParameter params[], void* user_data_or_fixture); MunitResult test_qcopy_stride(const MunitParameter params[], diff --git a/tests/blas/level1/test_isamax.c b/tests/blas/level1/test_isamax.c index 3c8b898..ac9100d 100644 --- a/tests/blas/level1/test_isamax.c +++ b/tests/blas/level1/test_isamax.c @@ -56,3 +56,13 @@ MunitResult test_isamax_13542(const MunitParameter params[], return MUNIT_OK; } + +// N==1: the (only) maximum is at 0-based index 0, not 1. +MunitResult test_isamax_one(const MunitParameter params[], + void* user_data_or_fixture) { + float32_t* SX = svec((float[]){42.0f}, 1); + uint64_t I = isamax(1, SX, 1); + assert_int(I, ==, 0); + free(SX); + return MUNIT_OK; +} diff --git a/tests/blas/level1/test_qaxpy.c b/tests/blas/level1/test_qaxpy.c index ea7d1f4..f6269ec 100644 --- a/tests/blas/level1/test_qaxpy.c +++ b/tests/blas/level1/test_qaxpy.c @@ -165,3 +165,29 @@ MunitResult test_qaxpy_neg_stride(const MunitParameter params[], void *user_data } + +// qaxpy must canonicalize NaNs in EVERY written element, not just QY[0]. +// QX[1] is a non-canonical NaN whose (quieted) payload SoftFloat propagates; +// without a per-element nan_unify_q, QY[1] escapes non-canonical and breaks +// the determinism guarantee. +MunitResult test_qaxpy_nan_unify(const MunitParameter params[], + void* user_data_or_fixture) { + float128_t QA = {{ 0x0000000000000000, 0x3fff000000000000 }}; // 1.0 + float128_t* QX = qvec((float128_pair_t[]){ + {.hi = 0x3fff000000000000, .lo = 0x0000000000000000}, // 1.0 + {.hi = 0x7fff000000000001, .lo = 0x00000000deadbeef}}, // non-canonical NaN + 2); + float128_t* QY = qvec((float128_pair_t[]){ + {.hi = 0x0000000000000000, .lo = 0x0000000000000000}, + {.hi = 0x0000000000000000, .lo = 0x0000000000000000}}, + 2); + + qaxpy(2, QA, QX, 1, QY, 1, 'n'); + + // QY[1] = 1*NaN + 0 -> canonical QUADNAN with zeroed low word. + assert_ullong(QY[1].v[1], ==, QUADNAN); + assert_ullong(QY[1].v[0], ==, 0x0000000000000000); + + free(QX); free(QY); + return MUNIT_OK; +} diff --git a/tests/blas/level1/test_qswap.c b/tests/blas/level1/test_qswap.c index 693ac07..711f664 100644 --- a/tests/blas/level1/test_qswap.c +++ b/tests/blas/level1/test_qswap.c @@ -89,7 +89,7 @@ MunitResult test_qswap_stride(const MunitParameter params[], {.hi = 0xc001000000000000, .lo = 0x0000000000000000}, {.hi = 0x0000000000000000, .lo = 0x0000000000000000}, {.hi = 0x4001400000000000, .lo = 0x0000000000000000}}, - 5); + 9); for (uint64_t i = 0; i < 5; i++) { assert_ullong(QX[i].v[0], ==, RX[i].v[0]); diff --git a/tests/blas/level1/test_sswap.c b/tests/blas/level1/test_sswap.c index 875a73d..140483c 100644 --- a/tests/blas/level1/test_sswap.c +++ b/tests/blas/level1/test_sswap.c @@ -66,3 +66,25 @@ MunitResult test_sswap_zero(const MunitParameter params[], free(SX); free(SY); free(RX); free(RY); return MUNIT_OK; } + +// Negative stride. sswap(N, X, -1, Y, +1) walks X backward while Y walks +// forward: X=[10,20,30], Y=[1,2,3] -> X=[3,2,1], Y=[30,20,10]. Under the old +// uint64_t incX the `incX < 0` guard was dead and the negative stride wrapped +// to a huge positive stride (out-of-bounds). +MunitResult test_sswap_negstride(const MunitParameter params[], + void* user_data_or_fixture) { + float32_t* SX = svec((float[]){10.0f, 20.0f, 30.0f}, 3); + float32_t* SY = svec((float[]){1.0f, 2.0f, 3.0f}, 3); + + sswap(3, SX, -1, SY, 1, 'n'); + + float32_t* RX = svec((float[]){3.0f, 2.0f, 1.0f}, 3); + float32_t* RY = svec((float[]){30.0f, 20.0f, 10.0f}, 3); + for (uint64_t i = 0; i < 3; i++) { + assert_ulong(SX[i].v, ==, RX[i].v); + assert_ulong(SY[i].v, ==, RY[i].v); + } + + free(SX); free(SY); free(RX); free(RY); + return MUNIT_OK; +} diff --git a/tests/test_all.c b/tests/test_all.c index 0394a03..6bd385f 100644 --- a/tests/test_all.c +++ b/tests/test_all.c @@ -25,12 +25,14 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) { {"/test_sscal_12345", test_sscal_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_sscal_stride", test_sscal_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_sswap_two", test_sswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_sswap_stride", test_sswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_sswap_stride", test_sswap_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_sswap_zero", test_sswap_zero, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_sswap_negstride", test_sswap_negstride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_isamax_0", test_isamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_isamax_12345", test_isamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_isamax_stride", test_isamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_isamax_13542", test_isamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_isamax_one", test_isamax_one, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_dasum_0", test_dasum_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_dasum_12345", test_dasum_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, @@ -52,11 +54,11 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) { {"/test_dscal_12345", test_dscal_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_dscal_stride", test_dscal_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_dswap_two", test_dswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_dswap_stride", test_dswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_idamax_0", test_isamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_idamax_12345", test_isamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_idamax_stride", test_isamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_idamax_13542", test_isamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_dswap_stride", test_dswap_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_idamax_0", test_idamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_idamax_12345", test_idamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_idamax_stride", test_idamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_idamax_13542", test_idamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_hasum_0", test_hasum_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_hasum_12345", test_hasum_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, @@ -78,11 +80,11 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) { {"/test_hscal_12345", test_hscal_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_hscal_stride", test_hscal_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_hswap_two", test_hswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_hswap_stride", test_hswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_ihamax_0", test_isamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_ihamax_12345", test_isamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_ihamax_stride", test_isamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_ihamax_13542", test_isamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_hswap_stride", test_hswap_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_ihamax_0", test_ihamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_ihamax_12345", test_ihamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_ihamax_stride", test_ihamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_ihamax_13542", test_ihamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qasum_0", test_qasum_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qasum_12345", test_qasum_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, @@ -92,6 +94,7 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) { {"/test_qaxpy_sum", test_qaxpy_sum, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qaxpy_stride", test_qaxpy_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qaxpy_neg_stride", test_qaxpy_neg_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_qaxpy_nan_unify", test_qaxpy_nan_unify, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qcopy_all", test_qcopy_all, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qcopy_stride", test_qcopy_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qdot_0", test_qdot_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, @@ -105,11 +108,11 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) { {"/test_qscal_12345", test_qscal_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qscal_stride", test_qscal_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_qswap_two", test_qswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_qswap_stride", test_qswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_iqamax_0", test_isamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_iqamax_12345", test_isamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_iqamax_stride", test_isamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, - {"/test_iqamax_13542", test_isamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_qswap_stride", test_qswap_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_iqamax_0", test_iqamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_iqamax_12345", test_iqamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_iqamax_stride", test_iqamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, + {"/test_iqamax_13542", test_iqamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_sgemv_0_row", test_sgemv_0_row, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, {"/test_sgemv_0_col", test_sgemv_0_col, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL}, diff --git a/tools/check_test_registration.sh b/tools/check_test_registration.sh new file mode 100755 index 0000000..d1c9a8e --- /dev/null +++ b/tools/check_test_registration.sh @@ -0,0 +1,15 @@ +#!/bin/sh +# Lint: every munit registration's display-name stem must equal its function +# pointer, so "X tests passed" can't lie about which routine was exercised. +# (See the swap_stride / idamax-as-isamax misregistration class.) +set -eu +f="tests/test_all.c" +bad=$(grep -oE '\{"/test_[a-z0-9_]+", *test_[a-z0-9_]+,' "$f" \ + | sed -E 's/\{"\/(test_[a-z0-9_]+)", *(test_[a-z0-9_]+),/\1 \2/' \ + | awk '$1 != $2 { print " " $1 " -> " $2 }') +if [ -n "$bad" ]; then + echo "FAIL: test registration name != function pointer in $f:" + echo "$bad" + exit 1 +fi +echo "OK: all test registrations match their function pointers"