Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update object properties without reindexing vector #3948

Closed
Tracked by #3999
etiennedi opened this issue Dec 27, 2023 · 1 comment
Closed
Tracked by #3999

Update object properties without reindexing vector #3948

etiennedi opened this issue Dec 27, 2023 · 1 comment
Labels
hf-updates High frequency updates planned-1.24

Comments

@etiennedi
Copy link
Member

etiennedi commented Dec 27, 2023

tl;dr

If an object is updated, but the vector is not altered, the object should be updated in place without the need to modify the vector index.

Background

With the current logic, every update creates a unique object and marks the previous doc id as deleted. This is because objects are immutable inside Weaviate, and there is no true update option. Every update is an insert+delete. However, this can be very costly when metadata is updated frequently, but the vector is not. In this case, changing an object in place (keeping the same doc id) without altering the vector would be preferred.

Reproducing

Here is a minimal example to show that updating the same object leads to a lot of HNSW updates:

  1. Create one object
  2. Update non-vector properties of object continously
  3. Count tombstones
import weaviate
import uuid

client = weaviate.connect_to_local()

client.collections.delete_all()
col = client.collections.create("Test")

my_id = uuid.UUID(int=17)

col.data.insert(uuid=my_id, properties={}, vector=[1, 2, 3])

for i in range(10000):
    col.data.update(properties={"iteration": i}, uuid=my_id)

Check tombstones:

curl -s localhost:2112/metrics | grep vector_index_tombstones
# HELP vector_index_tombstones Number of active vector index tombstones
# TYPE vector_index_tombstones gauge
vector_index_tombstones{class_name="Test",shard_name="c3ieorjafMqI"} 10000

Tech Notes

There is one other process where we can already do in-place updates without altering the doc id: The References Batch API. I believe we can reuse that logic.

The logic could be as follows:

  1. identify the old inverted index entries, so we know what needs to be cleaned up
  2. identify the new inverted index entries, so we know what needs to be added
  3. calculate the diff (I believe this logic exists here already
  4. Perform additions and deletions on the same doc id

Acceptance Criteria

  1. When an update involves a vector change, the existing logic is used: The doc ID is retired and a new doc ID is created
  2. When the update does not involve a vector change, the doc ID is kept and the inverted index is altered (according to the logic outlined above or sth similar)
  3. All filters still work correctly, i.e. they match for the updated values and don't match for previous values.
  4. Objects are considered identical if all user-settable properties are identical. This means they can be identical even with different update/create timestamps.
@trengrj
Copy link
Member

trengrj commented Feb 21, 2024

Completed in #3963

@trengrj trengrj closed this as completed Feb 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hf-updates High frequency updates planned-1.24
Projects
None yet
Development

No branches or pull requests

2 participants