Fix cosine similarity optimization bug#7724
Conversation
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
| T::zero() | ||
| } else { | ||
| dots[i] | ||
| } |
There was a problem hiding this comment.
definitely premature optimization, but I wonder if rustc is able to see that this can be written in a branchless way.
let either_is_zero = norms_l[i] == T::zero() || norms_r[i] == T::zero();
T::from(!either_is_zero) * dots[i]
Merging this PR will degrade performance by 42.11%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | WallTime | dynamic_dispatch_u32[10M] |
162.8 µs | 105.5 µs | +54.37% |
| ❌ | WallTime | for[10M_u8] |
73.8 µs | 127.4 µs | -42.11% |
| ❌ | WallTime | for[10M_u16] |
95.5 µs | 157.3 µs | -39.33% |
| ❌ | WallTime | mix[0%_in/100%_out] |
227.3 µs | 278.5 µs | -18.39% |
| ❌ | Simulation | new_bp_prim_test_between[i64, 32768] |
176.4 µs | 235.3 µs | -25.04% |
Comparing ct/fix-cosine-denorm-opt (3a54a19) with develop (0bb712b)
Footnotes
-
9 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Summary
Tracking issue: #7297
CosineSimilarityfast paths forL2Denormused only the decoded normalized children, so lossy encodings could return a non-zero cosine for rows whose actual stored norm was 0.This fix makes those fast paths also execute the stored norm children and return 0 when a denorm stored norm, or the plain-side norm in the one-denorm case, is zero while preserving the existing validity mask.
Testing
Some regression tests.