TurboQuant: Inv direction norms (correctness fix)#8049
Conversation
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
225.1 µs | 187.9 µs | +19.81% |
| ❌ | Simulation | new_alp_prim_test_between[f32, 32768] |
154 µs | 182.9 µs | -15.8% |
| ❌ | Simulation | new_alp_prim_test_between[f32, 16384] |
104.6 µs | 119.2 µs | -12.19% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ct/tq-inv-direction-norms (353a0ee) with develop (f852d72)1
Footnotes
27f4be5 to
accf369
Compare
Signed-off-by: "Connor Tsui" <connor.tsui20@gmail.com>
Signed-off-by: "Connor Tsui" <connor.tsui20@gmail.com>
Fix two correctness bugs in the L2Norm(TQDecode(_)) fast path. (1) The kernel coerced the returned norms to the child's nullability rather than the parent's, so wider-child-validity storage shapes that parse_storage accepts errored out at the dtype invariant. The kernel now coerces to parent.dtype().nullability() and a new test mirrors the malformed.rs shape. (2) The per-row inv_direction_norm computation could store a 0.0 sentinel for finite rows whose squared sum overflowed to +inf in f32 (or a +inf for denormal norm_squared), making decode emit zeros while the kernel returned the nonzero stored norm. Encode now rejects non-finite input norms up front and the denormal recip is guarded by is_normal(); regression tests cover both cases. Several should-fix items go with the must-fix: parse_storage_norms_only lets the kernel skip executing the codes and inv_direction_norms children it does not consume; the parity test pins down the exact new = old * inv_direction_norm[row] relationship rather than asserting "the values differ"; file roundtrip now asserts the new field survives serialization and the kernel still preserves stored norms; tests are parameterized over f16/f32/f64 and across padded vs unpadded dimensions; the kernel result is cross-checked against canonical L2Norm of the materialized decode. The hypothetical defensive metadata check on the kernel is dropped (registry key plus TQDecode signature already enforce shape). The dev-dep on vortex-array switches to workspace = true to match sibling encodings. Over-long doc lines are reflowed. Every type in the crate now has a doc comment, emphasizing the new inv_direction_norms storage child and the 0.0 sentinel semantics. Module docs single-source the storage schema rationale in storage.rs; lib.rs and the scalar-fn modules defer to it. Verified: cargo check, cargo clippy --all-targets --all-features, cargo +nightly fmt --all --check, cargo doc --no-deps, and cargo nextest run (102 tests, +14 new) all clean. Signed-off-by: "Connor Tsui" <connor.tsui20@gmail.com>
accf369 to
353a0ee
Compare
|
this is not smart |
Summary
Tracking issue: #7830
TODO
Testing
Still pretty basic testing