Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unlimited vector search by Certainty #1883

Merged
merged 8 commits into from Apr 1, 2022
Merged

Conversation

parkerduckworth
Copy link
Member

@parkerduckworth parkerduckworth commented Mar 28, 2022

Changes

Recursively searches the HNSW index for vectors above a specified certainty, increasing the size of the search with each iteration

Notes

Max Query Limit

This recursive search has a configurable max limit. If the call originates directly from the client (i.e. from a Get query), the search is limited by the configured QUERY_MAXIMUM_RESULTS. However if the call is invoked internally (i.e. by an aggregation process), a maximum limit is not imposed.

The reason for this is that every id found in the HNSW index for a Get query will be eventually used to fetch an object. If there was no upper bound to this example, the entire vector index could be converted into objects, which would be extremely resource intensive.

An aggregation does not need an object per id, and therefore does not suffer from the same issue.

Triggering Search By Certainty

Currently, a client searches trigger the unlimited search with a Get query by:

  • providing a certainty with a near vector search without providing a limit
  • providing a certainty with a near vector search and passing a limit of -1

@codecov
Copy link

codecov bot commented Mar 28, 2022

Codecov Report

Merging #1883 (97a82b8) into master (10a6cfa) will increase coverage by 0.33%.
The diff coverage is 80.18%.

@@            Coverage Diff             @@
##           master    #1883      +/-   ##
==========================================
+ Coverage   66.65%   66.98%   +0.33%     
==========================================
  Files         416      417       +1     
  Lines       31352    31739     +387     
==========================================
+ Hits        20898    21261     +363     
- Misses       8639     8655      +16     
- Partials     1815     1823       +8     
Flag Coverage Δ
integration 68.83% <78.41%> (+0.56%) ⬆️
unittests 66.98% <80.18%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
adapters/handlers/rest/clusterapi/indices.go 0.00% <0.00%> (ø)
...pters/handlers/rest/clusterapi/indices_payloads.go 0.00% <0.00%> (ø)
adapters/repos/db/disk_use.go 70.27% <0.00%> (-10.82%) ⬇️
usecases/sharding/remote_index_incoming.go 0.00% <0.00%> (ø)
entities/filters/pagination.go 86.66% <33.33%> (-13.34%) ⬇️
adapters/repos/db/shard_read.go 62.64% <61.94%> (-1.12%) ⬇️
...handlers/graphql/local/get/class_builder_fields.go 71.89% <78.26%> (+0.81%) ⬆️
usecases/traverser/explorer.go 70.31% <78.26%> (ø)
adapters/repos/db/inverted/stopwords/detector.go 88.23% <85.71%> (+47.49%) ⬆️
adapters/repos/db/index.go 69.55% <88.23%> (+0.21%) ⬆️
... and 37 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 91f6209...97a82b8. Read the comment docs.

@antas-marcin antas-marcin merged commit c8c97be into master Apr 1, 2022
@antas-marcin antas-marcin deleted the feature/WEAVIATE-19 branch April 1, 2022 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants