diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
index e310d53..38768d8 100644
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@@ -22,6 +22,9 @@ jobs:
           git config --global url."https://github.com/".insteadOf "git@github.com:"
           git submodule update --init --recursive
 
+      - name: Lint test registration
+        run: ./tools/check_test_registration.sh
+
       - name: Build SoftFloat
         run: make -C SoftFloat/build/Linux-x86_64-GCC
 
diff --git a/HANDOFF.md b/HANDOFF.md
new file mode 100644
index 0000000..f80c386
--- /dev/null
+++ b/HANDOFF.md
@@ -0,0 +1,164 @@
+# SoftBLAS — Handoff to urbit/numerics agent
+
+_Scratch document — not committed. Written 2026-05-30 after a five-reviewer audit
+(Gnome/Dwarf/Elf/Hobbit/Angel) of `urbit/SoftBLAS` and the follow-up fixes._
+
+## TL;DR
+
+A correctness audit found three silently-wrong-answer bugs, a process-killing
+error path, and an inert rounding-mode API. The first four are **fixed and merged
+to `master`**; CI now runs on every PR. The rounding-mode work lives in **PR #1**,
+which has been completed and reconciled with `master` but is **a breaking API
+change** that requires a coordinated Lagoon update in Vere.
+
+Your job picks up at: (1) finish the Lagoon side of the `rndMode` change, (2) bump
+the pinned SoftBLAS commit in Vere, (3) work the remaining audit backlog below.
+
+## What shipped this session
+
+| PR | State | Summary |
+|----|-------|---------|
+| **#8** | merged | `f128_abs` masked both float128 words (corrupted negative quads); `*gemm` indexed B with `ldc` instead of `ldb`; `*gemv` folded `incX/incY` into the matrix index. All three produced silently wrong results, none caught by the old tests. + regression tests. |
+| **#9** | merged | `exit(-1)` on invalid args (bad transpose/layout; `swap` N==0) → replaced with `return`. `swap` N==0 is now a no-op per BLAS spec. (Vere is the only client; it validates and bails via `u3m_bail`, so a plain `return` suffices — no error-callback machinery.) |
+| **#10** | merged | GitHub Actions CI: builds SoftFloat + SoftBLAS, runs the munit suite on every push/PR. |
+| **#1** | open, **mergeable**, **CI green** | Rounding-mode redesign — see below. Branch `sigilante/rounding-mode`. Merged up to `master` so it no longer reintroduces the #8 bugs. 140/140 tests pass (locally and in CI). |
+
+`master` is green (139 tests). The bugs were invisible before because every prior
+test used unit strides and `ldb == ldc`.
+
+## PR #1 (`sigilante/rounding-mode`) — what it is and what's left
+
+**Design:** rounding is now **per-call**. Every arithmetic routine gained a
+trailing `const uint_fast8_t rndMode` parameter and calls `_set_rounding(rndMode)`
+on entry (sets SoftFloat's `softfloat_roundingMode`). The old global
+`softblas_roundingMode` + `softblas_state.c` are **removed**. This supersedes the
+audit's "problem #5" (the old `set_rounding()` was never called, so the documented
+rounding control silently did nothing).
+
+`_set_rounding(a)` accepts `'n' 'z' 'u' 'd' 'a'`; an **unknown mode is a no-op**
+(leaves SoftFloat's current mode unchanged) rather than `exit(1)`.
+
+**Completed in this handoff's work (pushed to the branch):**
+- Converted all remaining implementations to match the header's `rndMode`
+  prototypes; restored the `'a'` (round-away) mode; made unknown-mode a no-op.
+- Updated every test call site; added `test_saxpy_rounding`, which proves the mode
+  actually changes results (`1.0 + 2⁻²⁴` → `1.0` under `'n'`, next-float-up under
+  `'u'`). Would fail if `_set_rounding` were inert.
+- Merged `origin/master` in: #8's fixes + `rndMode` now coexist (verified:
+  `sgemm` has `ldb` + `rndMode`; `sgemv` has `A[i*lda+j]` + `rndMode`; `f128_abs`
+  fixed). 140/140 pass.
+
+**⚠️ This is a breaking API change.** Every routine signature gained a trailing
+arg. All downstream callers must pass a mode.
+
+**Deliberately NOT given `rndMode`** (they do no rounding / header omits it):
+`isamax/idamax/ihamax/iqamax` (index-of-max = abs + compare), and `sdsdot/hsdot`
+(their prototypes don't take it). Leave these alone.
+
+**Converted but NOT built** (kept header-consistent for later): the complex
+(`caxpy/ccopy/cdotc/csrot/cscal/scasum/scnrm2`) and rotation
+(`*rot/*rotg/*rotm`) routines now carry `rndMode`, but they are **not** in the
+Makefile and several still have pre-existing compile bugs (see backlog). `*rotmg`
+were left unconverted (multi-line signatures; also unbuilt).
+
+## Immediate action items (Vere / Lagoon side)
+
+1. **Lagoon `rndMode` update — REQUIRED, do in tandem with merging PR #1.**
+   `vere/pkg/noun/jets/i/lagoon.c` has **~32 BLAS call sites** (`saxpy/sdot/sscal/
+   sgemm` and h/d/q variants) that call the *old* signatures. Each needs the
+   trailing `rndMode` argument. Lagoon already computes the mode — see
+   `_set_rounding(c3_w a)` at ~`lagoon.c:40`, which maps `%n/%z/%u/%d/%a`. Thread
+   that same char (the jet's rounding-mode argument) into each BLAS call. The
+   moment PR #1 merges, Vere won't compile until this is done.
+   - Note: `i*amax` calls take **no** `rndMode` — don't add it there.
+   - Lagoon currently sets `softfloat_roundingMode` directly (and the now-removed
+     `softblas_roundingMode`); after this change, drop the
+     `softblas_roundingMode = ...` writes (that global no longer exists) and rely
+     on passing `rndMode` per call.
+
+2. **Bump the pinned SoftBLAS commit in Vere.**
+   `vere/ext/softblas/build.zig.zon:10` pins `cbffb33f…` (an old commit that has
+   *all three* #8 bugs). After merges land, bump `.url` + `.hash` to a `master`
+   commit that includes #8/#9/#10 (and PR #1 once merged). Also check
+   `vere/ext/softblas/build.zig`'s source list still matches SoftBLAS's
+   `BLAS_SRCS` (no `softblas_state.c` anymore).
+
+3. **Merge PR #1** once Lagoon is ready. It's mergeable; "UNSTABLE" only means CI
+   isn't wired on that branch yet (CI reached `master` via #10 after PR #1 forked;
+   the merge commit already pulled `master` in, so a re-merge of latest `master`
+   would attach CI).
+
+## Remaining SoftBLAS backlog (from the audit, not yet done)
+
+A fuller list was kept in `WORKLIST.md` (was stashed under
+`stash@{0}` on the `ci/github-actions` branch: `git stash show -p stash@{0}`).
+Highlights, roughly by severity:
+
+**Correctness / determinism**
+- `qaxpy`: `nan_unify_q(QY)` only canonicalizes `QY[0]`; NaNs in `QY[1..]` escape →
+  breaks determinism for vectors length > 1. Unify per written element.
+- `*nrm2`: loop bound `(N-1)*incX` can integer-overflow → early exit / infinite loop.
+- `FNEG` macro (`1 << 31` / `1 << 63`): shifts a signed int into the sign bit = UB.
+  Affects complex conjugation. Use a width-correct unsigned literal.
+- `*swap`/`csrot`: declare `incX` as `uint64_t` then test `if (incX < 0)` (always
+  false) → negative strides become huge positives → OOB. (May differ post-PR#1.)
+- `iamax`: 0-based for N>1 but hardcodes `return 1` for N==1 — inconsistent; document
+  the convention (Fortran BLAS is 1-based; this is effectively 0-based).
+- `f128M_abs`: now masks the right word, but is still a structurally broken
+  `(float128_t*){…}` compound literal (currently unused).
+
+**Resource / robustness**
+- `qasum` leaks 16 bytes/call (heap accumulator never freed; use a stack temp).
+- `svec/dvec/hvec/qvec` don't null-check `malloc`.
+
+**Broken / orphaned code presented as "complete"**
+- `rot/rotg/rotm/rotmg` (all precisions) are declared in the header but **not built**
+  → link errors if a caller uses them. Either build+fix or document as unbuilt.
+- `sdsdot.c`, `hsdot.c`, `scnrm2.c` don't compile (undefined idents / complex type
+  errors). `qrot` prototype uses `float16_t*` (copy-paste). `*rotmg` compute
+  constants via integer multiply of hex bit-patterns (meaningless). `srot/drot/hrot`
+  compare SoftFloat structs to int literals with `!=` (UB) and omit
+  `#include "softblas.h"`.
+
+**Test-harness bugs (in `tests/test_all.c`)**
+- `*swap_stride` slots run `*swap_two` (the stride variant is never exercised).
+- `idamax/ihamax/iqamax` stride+13542 slots point at the **`isamax`** functions —
+  the d/h/q index-max tests don't test their own routine.
+
+**Hygiene / scope**
+- `v[0]=lo, v[1]=hi` float128 layout invariant should be documented in the header
+  (currently an `// XXX` note in `qdot.c`).
+- Consider an X-macro for the 6 genuinely-identical Level-1 routines
+  (axpy/copy/dot/scal/swap/asum) across `h/s/d`; keep `q` separate (pointer API).
+- Move internal helpers (`*vec`, `nan_unify_*`) to an internal header.
+- `benchmarking/` is dead (benchmarking was removed from planned work, commit
+  f4908d3) — candidate for deletion. Unused complex 16/128-bit (`i`/`v`) type stubs.
+- Naming sovereignty is limited by downstream clients — coordinate renames.
+
+## Build & test
+
+- **CI / Linux (canonical):** submodules (SoftFloat is an SSH submodule — rewrite to
+  HTTPS), `make -C SoftFloat/build/Linux-x86_64-GCC`, then `make tests && ./test_all`.
+  See `.github/workflows/ci.yml`.
+- **macOS gotcha:** the Makefile hardcodes `gcc`, `-LSoftFloat/build/Linux-x86_64-GCC`,
+  and `-l:softfloat.a` (GNU-ld only) — it does **not** build out-of-the-box on
+  Apple `ld`. During this work I built locally with `clang`, linking the checked-in
+  (arm64) `softfloat.a` directly and compiling the `BLAS_SRCS` list by hand. A
+  Darwin build path is an open backlog item.
+
+## Landmines / things that bit me
+
+- The whole reason the bugs survived: tests only used unit strides and `ldb==ldc`.
+  Any new routine work should add non-unit-stride / `ldb≠ldc` coverage.
+- PR #1's source merged cleanly with #8 because the changes were on different lines
+  (signature/`_set_rounding` at top vs. index fixes in the loop body) — but always
+  re-verify `f128_abs`, `*gemm` `ldb`, and `*gemv` indexing after any future merge.
+- When adding `rndMode` to test calls, watch for calls with trailing comments
+  (`sswap(0,…);  // …`) — a naive "ends with `);`" filter misses them.
+- Merging `master` into PR #1 silently re-took master's **Makefile**, which lists
+  `softblas_state.c` (deleted here) and the WIP complex routines (`scasum`/`ccopy`
+  don't compile) — this broke CI. Resolved by restoring the branch's `BLAS_SRCS`
+  (40 REAL routines) and re-deleting `softblas_state.c`. On any future `master`
+  merge, re-check the Makefile didn't resurrect either. The branch's complex/rot
+  files carry `rndMode` but are intentionally unbuilt and still WIP-broken
+  (e.g. `ccopy` declares `ix/iy` but uses `iX/iY`) — fix when completing complex.
diff --git a/README.md b/README.md
index 4ca66ac..ed1f7ad 100644
--- a/README.md
+++ b/README.md
@@ -12,10 +12,11 @@ Following SoftFloat 3e and requiring a 64-bit OS, all quantities are passed by v
 
 - 1.0.0 (commit `7d05697aea5363dcf5f877a9c8b464e9c352d3d4`).  Basic suite of `REAL`-valued operations suitable for use with half, single, double, and quadruple precision floating-point numbers.
 - 1.1.0 (commit `f94acbcfd26cebd8e135ad9e8c7caa156fcc4ac9`).  Errors changed to `exit(-1)` instead of `return`.
+- Unreleased.  Invalid-argument errors `return` (no-op) again instead of `exit(-1)`, so the library never terminates the host process.  The rounding mode is now a per-call argument (see below) rather than a global.
 
 ### Planned work
 
-- [ ] Add rounding-mode propagation to function signatures (in progress).
+- [x] Add rounding-mode propagation to function signatures.
 - [ ] Complete complex-valued functions (in progress).
 - [ ] Use a linter.
 - [ ] Add (kelvin) versioning, at least on interface.
@@ -31,13 +32,13 @@ Following SoftFloat 3e and requiring a 64-bit OS, all quantities are passed by v
 |  64 | `d` |  `float64_t` |  64 | `z` |  `complex64_t` |
 | 128 | `q` | `float128_t` | 128 | `v` | `complex128_t` |
 
-The rounding mode should be set via the global variable `softblas_roundingMode` (a `char` `typedef`), defined in `softblas.h`.  Valid values are:
+Every routine takes a trailing rounding-mode argument `rndMode` (a `uint_fast8_t` holding one of the `char` codes below).  It is applied via `_set_rounding` at routine entry, before any arithmetic.  An unrecognized code leaves SoftFloat's current mode unchanged.  Valid values are:
 
-- `'n'` - round to nearest (even)
+- `'n'` - round to nearest, ties to even
 - `'z'` - round towards zero
-- `'u'` - round up
-- `'d'` - round down
-- `'a'` - round away from zero
+- `'u'` - round up (towards +∞)
+- `'d'` - round down (towards −∞)
+- `'a'` - round to nearest, ties away from zero
 
 Per Wikipedia:
 
diff --git a/include/softblas.h b/include/softblas.h
index f5e610d..3b98f35 100644
--- a/include/softblas.h
+++ b/include/softblas.h
@@ -168,11 +168,16 @@ typedef struct {
 #define c64_div(a, b) (complex64_t){ f64_div( f64_add( f64_mul(a.real, b.real), f64_mul(a.imag, b.imag) ), f64_add( f64_mul(b.real, b.real), f64_mul(b.imag, b.imag) ) ), f64_div( f64_sub( f64_mul(a.imag, b.real), f64_mul(a.real, b.imag) ), f64_add( f64_mul(b.real, b.real), f64_mul(b.imag, b.imag) ) ) }
 #define c128_div(a, b) (complex128_t){ f128_div( f128_add( f128_mul(a.real, b.real), f128_mul(a.imag, b.imag) ), f128_add( f128_mul(b.real, b.real), f128_mul(b.imag, b.imag) ) ), f128_div( f128_sub( f128_mul(a.imag, b.real), f128_mul(a.real, b.imag) ), f128_add( f128_mul(b.real, b.real), f128_mul(b.imag, b.imag) ) ) }
 
-#define c16_conj(a) (complex16_t){ a.real, FNEG(a.imag) }
-#define c32_conj(a) (complex32_t){ a.real, FNEG(a.imag) }
-#define c64_conj(a) (complex64_t){ a.real, FNEG(a.imag) }
-#define c128_conj(a) (complex128_t){ a.real, FNEG(a.imag) }
-#define FNEG(x) ( x.v ^ (1 << (sizeof(x.v) * 8 - 1)) )
+//  FNEG flips the sign bit of a scalar-valued SoftFloat type (float16/32/64).
+//  The mask must be unsigned and at least as wide as x.v: a plain `1 << 31`
+//  is undefined (signed int overflow) and `1 << 63` does not fit in an int.
+#define FNEG(x) ( (x).v ^ ((uint64_t)1 << (sizeof((x).v) * 8 - 1)) )
+#define c16_conj(a) (complex16_t){ (a).real, FNEG((a).imag) }
+#define c32_conj(a) (complex32_t){ (a).real, FNEG((a).imag) }
+#define c64_conj(a) (complex64_t){ (a).real, FNEG((a).imag) }
+//  float128_t holds its sign bit in the high word v[1]; v is an array, so it
+//  cannot use FNEG (which xors the whole scalar). Flip v[1]'s sign bit only.
+#define c128_conj(a) (complex128_t){ (a).real, (float128_t){ (a).imag.v[0], (a).imag.v[1] ^ ((uint64_t)1 << 63) } }
 
 //  SOFTBLAS PROTOTYPES
 
@@ -188,7 +193,7 @@ void srotg(float32_t *a, float32_t *b, float32_t *c, float32_t *s, const uint_fa
 void srotm(const uint64_t N, float32_t *X, const uint64_t incX, float32_t *Y, const uint64_t incY, const float32_t *P, const uint_fast8_t rndMode);
 void srotmg(float32_t *D1, float32_t *D2, float32_t *X1, const float32_t y1, float32_t *P, const uint_fast8_t rndMode);
 void sscal(uint64_t N, float32_t SA, float32_t *SX, uint64_t incX, const uint_fast8_t rndMode);
-void sswap(uint64_t N, float32_t *SX, uint64_t incX, float32_t *SY, uint64_t incY, const uint_fast8_t rndMode);
+void sswap(uint64_t N, float32_t *SX, int64_t incX, float32_t *SY, int64_t incY, const uint_fast8_t rndMode);
 uint64_t isamax(uint64_t N, const float32_t *SX, uint64_t incX);
 
 //    Double-precision
@@ -202,7 +207,7 @@ void drotg(float64_t *a, float64_t *b, float64_t *c, float64_t *s, const uint_fa
 void drotm(const uint64_t N, float64_t *X, const uint64_t incX, float64_t *Y, const uint64_t incY, const float64_t *P, const uint_fast8_t rndMode);
 void drotmg(float64_t *D1, float64_t *D2, float64_t *X1, const float64_t y1, float64_t *P, const uint_fast8_t rndMode);
 void dscal(uint64_t N, float64_t DA, float64_t *DX, uint64_t incX, const uint_fast8_t rndMode);
-void dswap(uint64_t N, float64_t *DX, uint64_t incX, float64_t *DY, uint64_t incY, const uint_fast8_t rndMode);
+void dswap(uint64_t N, float64_t *DX, int64_t incX, float64_t *DY, int64_t incY, const uint_fast8_t rndMode);
 uint64_t idamax(uint64_t N, const float64_t *DX, uint64_t incX);
 
 //    Half-precision
@@ -216,7 +221,7 @@ void hrotg(float16_t *a, float16_t *b, float16_t *c, float16_t *s, const uint_fa
 void hrotm(const uint64_t N, float16_t *X, const uint64_t incX, float16_t *Y, const uint64_t incY, const float16_t *P, const uint_fast8_t rndMode);
 void hrotmg(float16_t *D1, float16_t *D2, float16_t *X1, const float16_t y1, float16_t *P, const uint_fast8_t rndMode);
 void hscal(uint64_t N, float16_t HA, float16_t *HX, uint64_t incX, const uint_fast8_t rndMode);
-void hswap(uint64_t N, float16_t *HX, uint64_t incX, float16_t *HY, uint64_t incY, const uint_fast8_t rndMode);
+void hswap(uint64_t N, float16_t *HX, int64_t incX, float16_t *HY, int64_t incY, const uint_fast8_t rndMode);
 uint64_t ihamax(uint64_t N, const float16_t *HX, uint64_t incX);
 
 //    Quad-precision
@@ -230,7 +235,7 @@ void qrotg(float128_t *a, float128_t *b, float128_t *c, float128_t *s, const uin
 void qrotm(const uint64_t N, float128_t *X, const uint64_t incX, float128_t *Y, const uint64_t incY, const float128_t *P, const uint_fast8_t rndMode);
 void qrotmg(float128_t *D1, float128_t *D2, float128_t *X1, const float128_t y1, float128_t *P, const uint_fast8_t rndMode);
 void qscal(uint64_t N, float128_t QA, float128_t *QX, uint64_t incX, const uint_fast8_t rndMode);
-void qswap(uint64_t N, float128_t *QX, uint64_t incX, float128_t *QY, uint64_t incY, const uint_fast8_t rndMode);
+void qswap(uint64_t N, float128_t *QX, int64_t incX, float128_t *QY, int64_t incY, const uint_fast8_t rndMode);
 uint64_t iqamax(uint64_t N, const float128_t *QX, uint64_t incX);
 
 //    Complex single-precision
@@ -239,7 +244,7 @@ void caxpy(uint64_t N, complex32_t CA, complex32_t *CX, int64_t incX, complex32_
 void ccopy(uint64_t N, const complex32_t *CX, int64_t incX, complex32_t *CY, int64_t incY, const uint_fast8_t rndMode);
 complex32_t cdotc(uint64_t N, const complex32_t *CX, int64_t incX, const complex32_t *CY, int64_t incY, const uint_fast8_t rndMode);
 float32_t scnrm2(uint64_t N, const complex32_t *CX, uint64_t incX, const uint_fast8_t rndMode);
-void csrot(const uint64_t N, complex32_t *CX, const uint64_t incX, complex32_t *CY, const uint64_t incY, const complex32_t c, const complex32_t s, const uint_fast8_t rndMode);
+void csrot(const uint64_t N, complex32_t *CX, const int64_t incX, complex32_t *CY, const int64_t incY, const complex32_t c, const complex32_t s, const uint_fast8_t rndMode);
 
 void cscal(uint64_t N, complex32_t CA, complex32_t *CX, uint64_t incX, const uint_fast8_t rndMode);
 void cswap(uint64_t N, complex32_t *CX, uint64_t incX, complex32_t *CY, uint64_t incY, const uint_fast8_t rndMode);
diff --git a/src/blas/level1/csrot.c b/src/blas/level1/csrot.c
index b230a37..58d3f04 100644
--- a/src/blas/level1/csrot.c
+++ b/src/blas/level1/csrot.c
@@ -1,6 +1,6 @@
 #include "softblas.h"
 
-void csrot(const uint64_t N, complex32_t *CX, const uint64_t incX, complex32_t *CY, const uint64_t incY, const complex32_t c, const complex32_t s, const uint_fast8_t rndMode) {
+void csrot(const uint64_t N, complex32_t *CX, const int64_t incX, complex32_t *CY, const int64_t incY, const complex32_t c, const complex32_t s, const uint_fast8_t rndMode) {
     _set_rounding(rndMode);
     complex32_t temp;
     int64_t ix = 0;
diff --git a/src/blas/level1/dnrm2.c b/src/blas/level1/dnrm2.c
index b33da78..0faa352 100644
--- a/src/blas/level1/dnrm2.c
+++ b/src/blas/level1/dnrm2.c
@@ -14,7 +14,7 @@ float64_t dnrm2(uint64_t N, const float64_t *X, uint64_t incX, const uint_fast8_
         float64_t ssq = { SB_REAL64_ONE };
         float64_t absXI;
 
-        for (uint64_t ix = 0; ix < 1 + (N - 1) * incX; ix += incX) {
+        for (uint64_t k = 0, ix = 0; k < N; k++, ix += incX) {
             if (f64_ne(X[ix], (float64_t){ SB_REAL64_ZERO })) {
                 absXI = f64_abs(X[ix]);
                 if (f64_lt(scale, absXI)) {
diff --git a/src/blas/level1/dswap.c b/src/blas/level1/dswap.c
index 4b7fba7..89f942b 100644
--- a/src/blas/level1/dswap.c
+++ b/src/blas/level1/dswap.c
@@ -1,13 +1,13 @@
 #include "softblas.h"
 
-void dswap(uint64_t N, float64_t *DX, uint64_t incX, float64_t *DY, uint64_t incY, const uint_fast8_t rndMode) {
+void dswap(uint64_t N, float64_t *DX, int64_t incX, float64_t *DY, int64_t incY, const uint_fast8_t rndMode) {
     _set_rounding(rndMode);
     float64_t dtemp;
 
     if (N == 0) return;
 
-    uint64_t iX = 0;
-    uint64_t iY = 0;
+    int64_t iX = 0;
+    int64_t iY = 0;
     if (incX < 0) iX = (-N + 1) * incX;
     if (incY < 0) iY = (-N + 1) * incY;
     for (uint64_t i = 0; i < N; i++) {
diff --git a/src/blas/level1/hnrm2.c b/src/blas/level1/hnrm2.c
index 2c64121..e4290bf 100644
--- a/src/blas/level1/hnrm2.c
+++ b/src/blas/level1/hnrm2.c
@@ -14,7 +14,7 @@ float16_t hnrm2(uint64_t N, const float16_t *X, uint64_t incX, const uint_fast8_
         float16_t ssq = { SB_REAL16_ONE };
         float16_t absXI;
 
-        for (uint64_t ix = 0; ix < 1 + (N - 1) * incX; ix += incX) {
+        for (uint64_t k = 0, ix = 0; k < N; k++, ix += incX) {
             if (f16_ne(X[ix], (float16_t){ SB_REAL16_ZERO })) {
                 absXI = f16_abs(X[ix]);
                 if (f16_lt(scale, absXI)) {
diff --git a/src/blas/level1/hswap.c b/src/blas/level1/hswap.c
index 025ce71..baae923 100644
--- a/src/blas/level1/hswap.c
+++ b/src/blas/level1/hswap.c
@@ -1,13 +1,13 @@
 #include "softblas.h"
 
-void hswap(uint64_t N, float16_t *HX, uint64_t incX, float16_t *HY, uint64_t incY, const uint_fast8_t rndMode) {
+void hswap(uint64_t N, float16_t *HX, int64_t incX, float16_t *HY, int64_t incY, const uint_fast8_t rndMode) {
     _set_rounding(rndMode);
     float16_t htemp;
 
     if (N == 0) return;
 
-    uint64_t iX = 0;
-    uint64_t iY = 0;
+    int64_t iX = 0;
+    int64_t iY = 0;
     if (incX < 0) iX = (-N + 1) * incX;
     if (incY < 0) iY = (-N + 1) * incY;
     for (uint64_t i = 0; i < N; i++) {
diff --git a/src/blas/level1/idamax.c b/src/blas/level1/idamax.c
index 5937e62..99378f9 100644
--- a/src/blas/level1/idamax.c
+++ b/src/blas/level1/idamax.c
@@ -2,7 +2,7 @@
 
 uint64_t idamax(uint64_t N, const float64_t *DX, uint64_t incX) {
     if (N < 1 || incX <= 0) return 0;
-    if (N == 1) return 1;
+    if (N == 1) return 0;
 
     uint64_t idamax = 0;
     uint64_t ix = 0;
diff --git a/src/blas/level1/ihamax.c b/src/blas/level1/ihamax.c
index 453910f..43fda8a 100644
--- a/src/blas/level1/ihamax.c
+++ b/src/blas/level1/ihamax.c
@@ -2,7 +2,7 @@
 
 uint64_t ihamax(uint64_t N, const float16_t *HX, uint64_t incX) {
     if (N < 1 || incX <= 0) return 0;
-    if (N == 1) return 1;
+    if (N == 1) return 0;
 
     uint64_t ihamax = 0;
     uint64_t ix = 0;
diff --git a/src/blas/level1/iqamax.c b/src/blas/level1/iqamax.c
index 307ed36..4bd84c0 100644
--- a/src/blas/level1/iqamax.c
+++ b/src/blas/level1/iqamax.c
@@ -2,7 +2,7 @@
 
 uint64_t iqamax(uint64_t N, const float128_t *QX, uint64_t incX) {
     if (N < 1 || incX <= 0) return 0;
-    if (N == 1) return 1;
+    if (N == 1) return 0;
 
     uint64_t iqamax = 0;
     uint64_t ix = 0;
diff --git a/src/blas/level1/isamax.c b/src/blas/level1/isamax.c
index 70a3af5..316eee1 100644
--- a/src/blas/level1/isamax.c
+++ b/src/blas/level1/isamax.c
@@ -2,7 +2,7 @@
 
 uint64_t isamax(uint64_t N, const float32_t *SX, uint64_t incX) {
     if (N < 1 || incX <= 0) return 0;
-    if (N == 1) return 1;
+    if (N == 1) return 0;
 
     uint64_t isamax = 0;
     uint64_t ix = 0;
diff --git a/src/blas/level1/qaxpy.c b/src/blas/level1/qaxpy.c
index 67f4d75..614f541 100644
--- a/src/blas/level1/qaxpy.c
+++ b/src/blas/level1/qaxpy.c
@@ -11,9 +11,9 @@ void qaxpy(uint64_t N, float128_t QA, float128_t *QX, int64_t incX, float128_t *
         f128M_mul(&QA, &(QX[iX]), &QT);
         f128M_add(&(QY[iY]), &QT, &QT);
         QY[iY] = QT;
+        nan_unify_q(&(QY[iY]));
         iX += incX;
         iY += incY;
     }
 
-    nan_unify_q(QY);
 }
diff --git a/src/blas/level1/qnrm2.c b/src/blas/level1/qnrm2.c
index b4a1139..2771f75 100644
--- a/src/blas/level1/qnrm2.c
+++ b/src/blas/level1/qnrm2.c
@@ -17,7 +17,7 @@ float128_t qnrm2(uint64_t N, const float128_t *X, uint64_t incX, const uint_fast
         const float128_t QZERO = { SB_REAL128L_ZERO, SB_REAL128U_ZERO };
         const float128_t QONE  = { SB_REAL128L_ONE,  SB_REAL128U_ONE };
 
-        for (uint64_t ix = 0; ix < 1 + (N - 1) * incX; ix += incX) {
+        for (uint64_t k = 0, ix = 0; k < N; k++, ix += incX) {
             if (f128M_ne(&(X[ix]), &QZERO)) {
                 absXI = f128_abs(X[ix]);
                 if (f128M_lt(&scale, &absXI)) {
diff --git a/src/blas/level1/qswap.c b/src/blas/level1/qswap.c
index 694f23b..882be21 100644
--- a/src/blas/level1/qswap.c
+++ b/src/blas/level1/qswap.c
@@ -1,13 +1,13 @@
 #include "softblas.h"
 
-void qswap(uint64_t N, float128_t *QX, uint64_t incX, float128_t *QY, uint64_t incY, const uint_fast8_t rndMode) {
+void qswap(uint64_t N, float128_t *QX, int64_t incX, float128_t *QY, int64_t incY, const uint_fast8_t rndMode) {
     _set_rounding(rndMode);
     float128_t qtemp;
 
     if (N == 0) return;
 
-    uint64_t iX = 0;
-    uint64_t iY = 0;
+    int64_t iX = 0;
+    int64_t iY = 0;
     if (incX < 0) iX = (-N + 1) * incX;
     if (incY < 0) iY = (-N + 1) * incY;
     for (uint64_t i = 0; i < N; i++) {
diff --git a/src/blas/level1/snrm2.c b/src/blas/level1/snrm2.c
index 8731bb9..795ab47 100644
--- a/src/blas/level1/snrm2.c
+++ b/src/blas/level1/snrm2.c
@@ -14,7 +14,7 @@ float32_t snrm2(uint64_t N, const float32_t *X, uint64_t incX, const uint_fast8_
         float32_t ssq = { SB_REAL32_ONE };
         float32_t absXI;
 
-        for (uint64_t ix = 0; ix < 1 + (N - 1) * incX; ix += incX) {
+        for (uint64_t k = 0, ix = 0; k < N; k++, ix += incX) {
             if (f32_ne(X[ix], (float32_t){ SB_REAL32_ZERO })) {
                 absXI = f32_abs(X[ix]);
                 if (f32_lt(scale, absXI)) {
diff --git a/src/blas/level1/sswap.c b/src/blas/level1/sswap.c
index cee1e83..8c5eb5f 100644
--- a/src/blas/level1/sswap.c
+++ b/src/blas/level1/sswap.c
@@ -1,13 +1,13 @@
 #include "softblas.h"
 
-void sswap(uint64_t N, float32_t *SX, uint64_t incX, float32_t *SY, uint64_t incY, const uint_fast8_t rndMode) {
+void sswap(uint64_t N, float32_t *SX, int64_t incX, float32_t *SY, int64_t incY, const uint_fast8_t rndMode) {
     _set_rounding(rndMode);
     float32_t stemp;
 
     if (N == 0) return;
 
-    uint64_t iX = 0;
-    uint64_t iY = 0;
+    int64_t iX = 0;
+    int64_t iY = 0;
     if (incX < 0) iX = (-N + 1) * incX;
     if (incY < 0) iY = (-N + 1) * incY;
     for (uint64_t i = 0; i < N; i++) {
diff --git a/tests/blas/include/test.h b/tests/blas/include/test.h
index cc8bccb..ca2999b 100644
--- a/tests/blas/include/test.h
+++ b/tests/blas/include/test.h
@@ -54,6 +54,8 @@ MunitResult test_sswap_stride(const MunitParameter params[],
                               void* user_data_or_fixture);
 MunitResult test_sswap_zero(const MunitParameter params[],
                             void* user_data_or_fixture);
+MunitResult test_sswap_negstride(const MunitParameter params[],
+                                 void* user_data_or_fixture);
 MunitResult test_isamax_0(const MunitParameter params[],
                           void* user_data_or_fixture);
 MunitResult test_isamax_12345(const MunitParameter params[],
@@ -62,6 +64,8 @@ MunitResult test_isamax_stride(const MunitParameter params[],
                                void* user_data_or_fixture);
 MunitResult test_isamax_13542(const MunitParameter params[],
                               void* user_data_or_fixture);
+MunitResult test_isamax_one(const MunitParameter params[],
+                            void* user_data_or_fixture);
 
 MunitResult test_dasum_0(const MunitParameter params[],
                        void* user_data_or_fixture);
@@ -181,6 +185,8 @@ MunitResult test_qaxpy_stride(const MunitParameter params[],
                               void* user_data_or_fixture);
 MunitResult test_qaxpy_neg_stride(const MunitParameter params[],
                                   void* user_data_or_fixture);
+MunitResult test_qaxpy_nan_unify(const MunitParameter params[],
+                                 void* user_data_or_fixture);
 MunitResult test_qcopy_all(const MunitParameter params[],
                               void* user_data_or_fixture);
 MunitResult test_qcopy_stride(const MunitParameter params[],
diff --git a/tests/blas/level1/test_isamax.c b/tests/blas/level1/test_isamax.c
index 3c8b898..ac9100d 100644
--- a/tests/blas/level1/test_isamax.c
+++ b/tests/blas/level1/test_isamax.c
@@ -56,3 +56,13 @@ MunitResult test_isamax_13542(const MunitParameter params[],
 
     return MUNIT_OK;
 }
+
+//  N==1: the (only) maximum is at 0-based index 0, not 1.
+MunitResult test_isamax_one(const MunitParameter params[],
+                            void* user_data_or_fixture) {
+    float32_t* SX = svec((float[]){42.0f}, 1);
+    uint64_t I = isamax(1, SX, 1);
+    assert_int(I, ==, 0);
+    free(SX);
+    return MUNIT_OK;
+}
diff --git a/tests/blas/level1/test_qaxpy.c b/tests/blas/level1/test_qaxpy.c
index ea7d1f4..f6269ec 100644
--- a/tests/blas/level1/test_qaxpy.c
+++ b/tests/blas/level1/test_qaxpy.c
@@ -165,3 +165,29 @@ MunitResult test_qaxpy_neg_stride(const MunitParameter params[], void *user_data
 }
 
 
+
+//  qaxpy must canonicalize NaNs in EVERY written element, not just QY[0].
+//  QX[1] is a non-canonical NaN whose (quieted) payload SoftFloat propagates;
+//  without a per-element nan_unify_q, QY[1] escapes non-canonical and breaks
+//  the determinism guarantee.
+MunitResult test_qaxpy_nan_unify(const MunitParameter params[],
+                                 void* user_data_or_fixture) {
+    float128_t QA = {{ 0x0000000000000000, 0x3fff000000000000 }};  // 1.0
+    float128_t* QX = qvec((float128_pair_t[]){
+            {.hi = 0x3fff000000000000, .lo = 0x0000000000000000},   // 1.0
+            {.hi = 0x7fff000000000001, .lo = 0x00000000deadbeef}},  // non-canonical NaN
+        2);
+    float128_t* QY = qvec((float128_pair_t[]){
+            {.hi = 0x0000000000000000, .lo = 0x0000000000000000},
+            {.hi = 0x0000000000000000, .lo = 0x0000000000000000}},
+        2);
+
+    qaxpy(2, QA, QX, 1, QY, 1, 'n');
+
+    //  QY[1] = 1*NaN + 0 -> canonical QUADNAN with zeroed low word.
+    assert_ullong(QY[1].v[1], ==, QUADNAN);
+    assert_ullong(QY[1].v[0], ==, 0x0000000000000000);
+
+    free(QX); free(QY);
+    return MUNIT_OK;
+}
diff --git a/tests/blas/level1/test_qswap.c b/tests/blas/level1/test_qswap.c
index 693ac07..711f664 100644
--- a/tests/blas/level1/test_qswap.c
+++ b/tests/blas/level1/test_qswap.c
@@ -89,7 +89,7 @@ MunitResult test_qswap_stride(const MunitParameter params[],
             {.hi = 0xc001000000000000, .lo = 0x0000000000000000},
             {.hi = 0x0000000000000000, .lo = 0x0000000000000000},
             {.hi = 0x4001400000000000, .lo = 0x0000000000000000}},
-        5);
+        9);
 
     for (uint64_t i = 0; i < 5; i++) {
         assert_ullong(QX[i].v[0], ==, RX[i].v[0]);
diff --git a/tests/blas/level1/test_sswap.c b/tests/blas/level1/test_sswap.c
index 875a73d..140483c 100644
--- a/tests/blas/level1/test_sswap.c
+++ b/tests/blas/level1/test_sswap.c
@@ -66,3 +66,25 @@ MunitResult test_sswap_zero(const MunitParameter params[],
     free(SX); free(SY); free(RX); free(RY);
     return MUNIT_OK;
 }
+
+//  Negative stride. sswap(N, X, -1, Y, +1) walks X backward while Y walks
+//  forward: X=[10,20,30], Y=[1,2,3] -> X=[3,2,1], Y=[30,20,10]. Under the old
+//  uint64_t incX the `incX < 0` guard was dead and the negative stride wrapped
+//  to a huge positive stride (out-of-bounds).
+MunitResult test_sswap_negstride(const MunitParameter params[],
+                                 void* user_data_or_fixture) {
+    float32_t* SX = svec((float[]){10.0f, 20.0f, 30.0f}, 3);
+    float32_t* SY = svec((float[]){1.0f, 2.0f, 3.0f}, 3);
+
+    sswap(3, SX, -1, SY, 1, 'n');
+
+    float32_t* RX = svec((float[]){3.0f, 2.0f, 1.0f}, 3);
+    float32_t* RY = svec((float[]){30.0f, 20.0f, 10.0f}, 3);
+    for (uint64_t i = 0; i < 3; i++) {
+        assert_ulong(SX[i].v, ==, RX[i].v);
+        assert_ulong(SY[i].v, ==, RY[i].v);
+    }
+
+    free(SX); free(SY); free(RX); free(RY);
+    return MUNIT_OK;
+}
diff --git a/tests/test_all.c b/tests/test_all.c
index 0394a03..6bd385f 100644
--- a/tests/test_all.c
+++ b/tests/test_all.c
@@ -25,12 +25,14 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) {
             {"/test_sscal_12345", test_sscal_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_sscal_stride", test_sscal_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_sswap_two", test_sswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_sswap_stride", test_sswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_sswap_stride", test_sswap_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_sswap_zero", test_sswap_zero, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_sswap_negstride", test_sswap_negstride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_isamax_0", test_isamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_isamax_12345", test_isamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_isamax_stride", test_isamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_isamax_13542", test_isamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_isamax_one", test_isamax_one, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             
             {"/test_dasum_0", test_dasum_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_dasum_12345", test_dasum_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
@@ -52,11 +54,11 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) {
             {"/test_dscal_12345", test_dscal_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_dscal_stride", test_dscal_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_dswap_two", test_dswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_dswap_stride", test_dswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_idamax_0", test_isamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_idamax_12345", test_isamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_idamax_stride", test_isamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_idamax_13542", test_isamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_dswap_stride", test_dswap_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_idamax_0", test_idamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_idamax_12345", test_idamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_idamax_stride", test_idamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_idamax_13542", test_idamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             
             {"/test_hasum_0", test_hasum_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_hasum_12345", test_hasum_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
@@ -78,11 +80,11 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) {
             {"/test_hscal_12345", test_hscal_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_hscal_stride", test_hscal_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_hswap_two", test_hswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_hswap_stride", test_hswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_ihamax_0", test_isamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_ihamax_12345", test_isamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_ihamax_stride", test_isamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_ihamax_13542", test_isamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_hswap_stride", test_hswap_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_ihamax_0", test_ihamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_ihamax_12345", test_ihamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_ihamax_stride", test_ihamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_ihamax_13542", test_ihamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             
             {"/test_qasum_0", test_qasum_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_qasum_12345", test_qasum_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
@@ -92,6 +94,7 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) {
             {"/test_qaxpy_sum", test_qaxpy_sum, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_qaxpy_stride", test_qaxpy_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_qaxpy_neg_stride", test_qaxpy_neg_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_qaxpy_nan_unify", test_qaxpy_nan_unify, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_qcopy_all", test_qcopy_all, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_qcopy_stride", test_qcopy_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_qdot_0", test_qdot_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
@@ -105,11 +108,11 @@ int main(int argc, char* argv[MUNIT_ARRAY_PARAM(argc + 1)]) {
             {"/test_qscal_12345", test_qscal_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_qscal_stride", test_qscal_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_qswap_two", test_qswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_qswap_stride", test_qswap_two, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_iqamax_0", test_isamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_iqamax_12345", test_isamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_iqamax_stride", test_isamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
-            {"/test_iqamax_13542", test_isamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_qswap_stride", test_qswap_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_iqamax_0", test_iqamax_0, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_iqamax_12345", test_iqamax_12345, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_iqamax_stride", test_iqamax_stride, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
+            {"/test_iqamax_13542", test_iqamax_13542, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             
             {"/test_sgemv_0_row", test_sgemv_0_row, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
             {"/test_sgemv_0_col", test_sgemv_0_col, NULL, NULL, MUNIT_TEST_OPTION_NONE, NULL},
diff --git a/tools/check_test_registration.sh b/tools/check_test_registration.sh
new file mode 100755
index 0000000..d1c9a8e
--- /dev/null
+++ b/tools/check_test_registration.sh
@@ -0,0 +1,15 @@
+#!/bin/sh
+# Lint: every munit registration's display-name stem must equal its function
+# pointer, so "X tests passed" can't lie about which routine was exercised.
+# (See the swap_stride / idamax-as-isamax misregistration class.)
+set -eu
+f="tests/test_all.c"
+bad=$(grep -oE '\{"/test_[a-z0-9_]+", *test_[a-z0-9_]+,' "$f" \
+  | sed -E 's/\{"\/(test_[a-z0-9_]+)", *(test_[a-z0-9_]+),/\1 \2/' \
+  | awk '$1 != $2 { print "  " $1 "  ->  " $2 }')
+if [ -n "$bad" ]; then
+  echo "FAIL: test registration name != function pointer in $f:"
+  echo "$bad"
+  exit 1
+fi
+echo "OK: all test registrations match their function pointers"