fix(cast_to_i8/u8): guard zero-magnitude inputs against NaN cast#758
Merged
ashvardanian merged 1 commit intoMay 24, 2026
Merged
Conversation
The integer-quantizing casts normalize input vectors by their magnitude. Zero or NaN magnitudes produced NaN scaled values, and casting those values to `int8_t` or `uint8_t` is undefined behavior. Emit zero-filled quantized output when the magnitude is not strictly positive, covering both all-zero vectors and NaN inputs before the narrowing cast. Co-authored-by: Mikhail Chichvarin <6496186+desertfury@users.noreply.github.com> Co-authored-by: Mikhail Chichvarin <desertfury@nebius.com> Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com>
1898870 to
b3fc5ff
Compare
Contributor
|
Might be a good idea to more aggressively fuzz the NumKong kernels against similar issues, @desertfury 🤔 |
ashvardanian
pushed a commit
that referenced
this pull request
May 24, 2026
### Patch - Fix: Refuse operations without reserved thread contexts (#757) (b8d3403) - Fix: Checked arithmetic for allocation sizes (#763) (89da7c5) - Fix: Preserve hash lookup capacity across thread reserves (#765) (3cf843b) - Fix: Guard quantized casts against zero-magnitude inputs (#758) (0c903f4) - Fix: Refuse missing metrics in C change-metric API (#760) (490c1b2) - Fix: Short-circuit self-renames (#761) (830f31a) - Fix: Keep `vectors_lookup_` capacity after `clear()` #759 (9d77be5) - Improve: Serialize concurrent same-`Index` Python access with a mutex (47528b5) - Improve: Test GIL-release contract and progress-callback path (a598493) - Improve: Release Python GIL during long index operations (d8be67d) - Fix: Restore `ring_gt::try_push` return value (18c44ee) - Fix: Stop JavaScript `Remove` loop after exception throw (f3e1052) - Fix: Stop `usearch_init` on `make` failure (2aa0070) - Fix: Stop C-ABI metadata readers on failure (34889ee) - Fix: Report OOM from C-ABI thread-limit changers (ad24056) - Fix: Bounded probe in `equal_iterator_gt::operator++` (b779c47) - Fix: Resize cast buffer in `change_metric` for new bytes-per-vector (d544745) - Fix: Eager-reserve thread contexts in `index_dense_gt::make` (#755) (b296566)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The integer-quantizing casts normalize by
magnitude = ||v||. An all-zero vector makesmagnitude == 0, sotyped_input[i] * K / magnitudeproduces NaN, and casting NaN toint8_t/uint8_tis undefined behavior (UBSan: float-cast-overflow). The path is reachable from anyusearch_addagainst ani8/u8-quantized index with a zero input vector — the fuzzer triggered it directly.Emits a zero output vector when magnitude is not strictly positive (also covers NaN inputs that would have made magnitude NaN).