Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector diskann impl #1560

Merged
merged 11 commits into from
Jul 18, 2024
Merged

Vector diskann impl #1560

merged 11 commits into from
Jul 18, 2024

Conversation

sivukhin
Copy link
Contributor

@sivukhin sivukhin commented Jul 16, 2024

Context

Third branch in the series for DiskANN implementation. This PR introduce DiskANN algorithm itself 🎉

The algorithm core based on the paper LM-DiskANN: Low Memory Footprint in
Disk-Native Dynamic Graph-Based ANN Indexing

Current implementation can be suboptimal in some places due to usage of very trivial data structures (lists / arrays). That's how it was intended - we will improve performance in the subsequent PRs but make this one simpler.

Nevertheless, this PR tries to address most heavy operation in the algorithm - reading blobs from disk - and aims to make as little reads as possible (utilizing sqlite3_blob_reopen if possible).

From the performance perspective rough hierarchy of operation cost looks like this:

  1. read/write diskann node block
  2. distance calculation between two vectors (for example, openai vectors are 1536 dimensions long - so this is clearly very slow operation)
  3. operations with non-compressed vector payload
  4. all other operations

Changes

  • Implement DiskANN algorithm
  • Adjust BlobSpot code for cases when previous call to open/reopen failed (and blob became useless as all subsequent operations will return SQLITE_ABORT error; read sqlite3_blob_* docstrings for more details)
  • Add basic test in the test_libsql_diskann.c (it's a bit hacky for now - but it's fine since we soon will get merge full integration with SQLite and will no longer need these tests anymore, actually)

@sivukhin sivukhin changed the base branch from main to vector-aux-functions July 16, 2024 12:18
@sivukhin sivukhin marked this pull request as ready for review July 16, 2024 13:58
@sivukhin sivukhin requested a review from haaawk July 16, 2024 13:59
Base automatically changed from vector-aux-functions to main July 17, 2024 08:30
@sivukhin sivukhin added this pull request to the merge queue Jul 18, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jul 18, 2024
@sivukhin sivukhin added this pull request to the merge queue Jul 18, 2024
Merged via the queue into main with commit 3263c19 Jul 18, 2024
19 checks passed
@sivukhin sivukhin deleted the vector-diskann-impl branch July 18, 2024 10:18
@penberg penberg mentioned this pull request Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants