-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Segfault when dimensions of added vector don't add up (Rust) #412
Comments
Nice! Any chance you could open a PR? |
I will look into it tomorrow, maybe this error happens with other functions too... |
Another reason for segfaults is trying to add to the index when its capacity is 0 (not reserving). This could also be checked in the add function. |
This one is harder to justify as it's not a zero-cost change. The size is an atomic variable, and checking it from many threads would cause contention. |
I'm taking a look at this. Currently the rust tests are failing on add.
I added a test for the seg fault example however this passes for me.
I've added tests for other permutations of add and have hit a hang. |
@MarkReedZ, check #413 🤗 |
I'm looking at an unrelated error and cleaning up the tests to be more rust idiomatic. However, looking at the PR - are you suggesting fixing this in the base library by adding a size_t length argument to index::add then checking to ensure it matches options.dimensions? Or only fixing this for rust by updating rust/lib.cpp? |
I meant changing only the |
* Improve: Swift test for issue #399 (#400) * Fix: Integer overflow in aligned-alloc Fixes ClickHouse/ClickHouse#61780 Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com> * Make: Disable Windows NPM builds Relates to the #377 and the comment: #377 (comment) This temporarily disables the failing CI pipeline to generate and update docs. * Fix: Going beyond level 0 in clustering * Improve: Error handling in `index_dense_gt` This commit drops `std::vector` dependency, making compilation time shorter and error handling universal across abstraction layers. * Improve: Remove `std::function` calls * Improve: Remove `std::thread` from `index_dense_gt` * Improve: `std::vector` -> `buffer_gt` in plugins * Add: `usearch_change_threads_search` * Fix: `index_dense_t::make(path)` * Fix: Exhastive Search In the past, if we got "too lucky" traversing the graph, we could exit early before accumulating K top matches, even if the index had more than K entries. This patch changes that behavior, making output more predicatable. * Fix: Replacement leaves isolated nodes This patch addresses the issue #399, originally observed in the Swift layer. Reimplementing it in C++ helped locate the issue and lead to refactoring the `update` procedure in the lowest-layer `index_gt`. Now, `add` and `update` share less code. The `add` is one branch shorter (not that it would be noticeable), and `update` brings additional logic to avoid spilling `updated_slot` into top-results and consequently introducing self-loops. Closes #399 * Fix: Misc warnings & compilation issues * Fix: Misc warnings & compilation issues * Fix: Detect `ring_gt` being full Relates to #355 * Fix: Re-init `available_threads_` after load Both `view` and `load` would `reset` the thread contexts. After that, the very first `search` and `add` would fail, as no thread-local contexts are initialized. It would require a `reserve` call with a non-zero second arcgument to define the number of concurrent threads, for which the queues & buffers need to be allocated. That design is counter-intuitive, so this patch re-inits the same number of threads as before the `load` & `view` or one, if none existed. * Fix: `uint32_t` to `uint40_t` cast (#404) Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com> * Docs: Mention `b1` in `README.md` Co-authored-by: Adolfo Garcia <1250775+adolfogc@users.noreply.github.com> * Docs: Cover new users * Improve: Updates stability & catch bug * Fix: Dereferencing `member_iterator_t` * Add: Java `get` API (#407) * Fix: Compilation with `uint40_t` keys * Add: `AutoClosable` using `c_destroy` for Java (#408) * Fix: Rare deadlock on tiny collections * Improve: `enable_key_lookups=false` memory usage As noted by Robert Schulze, we can avoid populating `slot_lookup_` during insertions, if `enable_key_lookups` is not set. This would lead to lower memory consumption for large indexes of tiny vectors, particularly common in GIS. Co-authored-by: Robert Schulze <robert@clickhouse.com> * Fix: Preset `enable_key_lookups=true` in C * Fix: `std::is_pod` deprecated in C++20 * Fix: Unused type aliases * Improve: Avoid `#pragma region` pre GCC 13 (#386) * Do not use #pragma region if not supported by the compiler * `pragma region` supported by GCC 13+ https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85487 --------- Co-authored-by: Mikhail Bautin <mbautin@users.noreply.github.com> Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com> * Fix: Capacity checks in C# tests * Docs: Add doc-strings in C# * Make: Verbose C# logging in debug builds * Docs: Describe serialization in GoLang * Make: Drop Java & JS API references Compiling and introspecting docstrings in Sphinx is extremely flaky. It's safer to simply describe the usage patterns in the `README.md`. * Add: Load from path in Java (#410) * Improve: Avoid sorting on small "refines" * Docs: Cover JIT-compiled Python examples * Fix: `index.copy()` trying to `memcpy(*, NULL, *)` * Docs: UB in C++ Example (#415) Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com> * Improve: Catch UB with tests * Docs: Rearrange * Fix: Reserving contexts post-reload * Improve: Detect more failures in tests * Improve: Log failing lines * Fix: Clamp before down-casting to `i8_t` (#422) Previously we were down-casting floats to the target type (e.g. int8_t), and then clamping to [-100, 100] range. This means that e.g. 129 would be cast to -127 and then converted to -100, in stead of becoming 100 The fix does clamping first, and then casts the resulting number (which is guaranteed to be in range [-100, 100], due to clamping) from source type to target int8_t. Given the clamping, this will never overflow. --------- Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com> * Fix: Concurrency bug in high-K search When calling `index_dense_gt`, the thread lock was not propagating with the `search_result_t`. That is a an error-prone API. When too many threads are running in parallel (ideally, more than physical CPU cores) another thread may start reusing the `context_t` before the original caller finishes exporting entries with `dump_to`. This solution is backwards compatible and passes the tests. * Make: Manually bump version to 2.13.0 We can't yet rely on the SemVer tool semantic-release/release-notes-generator#633 (comment) * Improve: Attempt to implement batch-metrics * Add: Batch-capable metrics * Add: `misaligned_ptr_gt` comparators * Improve: Separate `error_t` constructors This makes debugging easier, as it's simpler to trace where the error message is being set. * Improve: Support `enum` slots Tracing implicit conversions of `std::uint32_t` and other primitive types isn't always easy in concurrent apps. This commit adds support for `enum` types to be used for safer implementation of `index_gt` specializations. * Improve: Ranges-V3 compatibility * Add: Preliminary support for batch metrics * Add: Batch-parallel refinements * Add: `MANIFEST.in` for `py.typed` (#425) Adding type annotation for Python native modules solves the `Skipping analyzing "usearch.index" module` warning due to `missing library stubs or py.typed marker`. Closes #424 * Fix: Clear cast buffer before bitwise ORs for `b1x8_t` (#428) When converting floating point arrays to binary, we use bitwise OR operations to set the relevant bits in the output buffer to 1. We do nothing if the bit is zero, so we assume that the bit is zero to start with. The `memset` statement makes sure this assumption holds. * Fix: `esm` duplicate import bug in `jest` (#420) Closes #418 Closes #426 * Fix: build.gradle deprecations * Fix: ESM build support (#433) Closes #426 Relates to #420 * Fix: `capacity()` assertion in Rust (#436) Closes #432 * Fix: Computing `stats(i).max_edges` * Add: Returning `computed_distances_in_refines` In high-connectivity graphs, the number of distance computations can be dominated by the number of "refine" heuristic computations performed by the core structure. The extended `add_result_t` now includes both: - `computed_distances_in_refines` - `computed_distances_in_reverse_refines` This commit also extends the documentation. * Add: Profiling attributes for `index_gt` * Fix: Preserve thread limits after `fork()` * Improve: Benchmarks self-recall support and `.bbin` This patch adds support for partial datasets, without ground truth neighborhood data. It also adds support for `.bbin` binary, `.hbin` half-precision, and `.dbin` double-precision input vector files/ * Fix: Printing progress between exit * Docs: Fix spelling * Fix: Wolfram bindings (#437) Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com> * Fix: Pre-reserve enough threads for C users This indirectly fixes the crash in C# layer * Revert: Parallel metrics * Fix: Updating the `entry_slot_` node * Improve: Enable single-threaded update tests * Make: Bump version * Fix: `flat_hash_multi_set_gt::reset` double-free * Fix: Not enough memory in `fork()` * Make: Bump to SimSIMD v5 * Fix: Missing SimSIMD v5 capability names * Improve: Detecting copy & move issues * Fix: Compilation w. explicit `template class` * Improve: Bypass UBSan NULL dereferencing warning * Improve: Minimize alignment issues * Make: Disable `bf16` on MacOS * Make: Link to GitHub repo * Fix: Conditional `call_key` compilation in MSVC * Fix: Unary minus on unsigned distances * Make: Disable `bf16` in JS * Fix: Compatibility with pre-v2.10 Closes #423 * Improve: Test wrong number of dimensions in Rust (#413) Closes #412 --------- Co-authored-by: Julius Brummack <juliusbrummack@icloud.com> * Fix: Handle wrong dimensionality in Rust * Fix: Overwriting `SIMSIMD_DYNAMIC_DISPATCH` * Make: Upgrade Java version * Fix: Spelling mistakes * Fix: `sprintf` deprecated * Fix: `-Wpass-failed=transform-warning` * Fix: Memory pinning on `Add` in C# * Make: Specify Java distribution * Fix: Pin memory in gets (C#) * Make: Skip `PersistAndRestore` in CI on MacOS * Make: Upgrade Docker action This fixes a GitHub CI warning about the deprecated NodeJS version. * Fix: `view_from_buffer` is unsafe in Rust Closes #453 Co-authored-by: Andrew Dirksen <2702854+bddap@users.noreply.github.com>" * Fix: `view_from_buffer` is unsafe in Rust Closes #453 Co-authored-by: Andrew Dirksen <2702854+bddap@users.noreply.github.com> * Make: Upgrade SimSIMD * Docs: Index header has no capacity The lack of capacity data is intended. Reserving memory is a non-persistent operation by nature, and we shouldn't save that metadata on the disk. Closes #452 Co-authored-by: Christopher Yim <4638193+GoodKnight@users.noreply.github.com> * Fix: Aggressive neighborhood checks on updates * Fix: `update()` bug detect with non-POD keys This bug was tough to spot. Apple Clang was the only one that caught it. The `-O0` flags were explicitly added to expose more symbols for debugging. More `uint40_t` tests were added. * Fix: Align vector type w index in C# (#456) Co-authored-by: Ash Vardanian <1983160+ashvardanian@users.noreply.github.com> * Make: Versioning in pre-release CI * Make: Switch from SemanticRelease to TinySemVer --------- Co-authored-by: Jaysen Marais <jaysen.marais@gmail.com> Co-authored-by: Antonio Andelic <antonio2368@users.noreply.github.com> Co-authored-by: Narek Galstyan <narekg@berkeley.edu> Co-authored-by: Adolfo Garcia <1250775+adolfogc@users.noreply.github.com> Co-authored-by: Trevor McCulloch <mccullocht@gmail.com> Co-authored-by: Robert Schulze <robert@clickhouse.com> Co-authored-by: Mikhail Bautin <552936+mbautin@users.noreply.github.com> Co-authored-by: Mikhail Bautin <mbautin@users.noreply.github.com> Co-authored-by: SheldonFung <88470100+SheldonFung98@users.noreply.github.com> Co-authored-by: James Braza <jamesbraza@gmail.com> Co-authored-by: cinchen <eryue0220@gmail.com> Co-authored-by: Mark Reed <markreed99@gmail.com> Co-authored-by: John <jjmace01@gmail.com> Co-authored-by: batracos <giulio.a@gmail.com> Co-authored-by: Ziyang Hu <2103823+zh217@users.noreply.github.com> Co-authored-by: Julius Brummack <133630819+jbrummack@users.noreply.github.com> Co-authored-by: Julius Brummack <juliusbrummack@icloud.com> Co-authored-by: Andrew Dirksen <2702854+bddap@users.noreply.github.com> Co-authored-by: Christopher Yim <4638193+GoodKnight@users.noreply.github.com> Co-authored-by: Britt <brittlewis12@gmail.com>
Describe the bug
Hello!
I am running into
zsh: segmentation fault cargo run --release combined_vectors.csv
, which is unexpected when using only safe Rust.Apparently my CSV had some messed up columns that had malformed data, yielding an input vector that is too short, causing a segfault.
Is it a good idea to check if the dimensions add up in the rust library, outputting an error as value?
Did I miss something?
Thank You!
Steps to reproduce
Expected behavior
usearch/rust/lib.rs
This is the code that is called when the crash occurs:
to handle the error gracefully add in another condition:
USearch version
usearch = { version = "2.12.0" }
Operating System
MacOS Sonoma 14.4.1
Hardware architecture
Arm
Which interface are you using?
Other bindings
Contact Details
awnings.truces0n@icloud.com
Are you open to being tagged as a contributor?
.git
history as a contributorIs there an existing issue for this?
Code of Conduct
The text was updated successfully, but these errors were encountered: