Skip re-vectorization of identical/similar objects in a batch #4163
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's being changed:
Addresses #3950
Notes:
moduletools.ObjectDiff
used for merge/patch withmoduletools.VectorizablePropsComparator
supporting all types of actionstext
,text[]
,blob
. On top of that selected vectorizer configuration is applied pointing out exact props to be vectorized.nil
) or empty array property (set to[]string{}
) are considered as not changed (equal) by vectorizable comparator.Performance:
Modified ann benchmark script was used to verify performance and behaviour of introduced changes (https://github.com/weaviate/weaviate-chaos-engineering/compare/dont_revectorize_performance).
Tests were run locally (M1 Pro) using single node.
Script measures execution time of each of the following steps:
Script was tested on 2 different weaviate versions:
master
(b7634fe) - skip reindex (Update object properties without reindexing vector #3948) + idempotent batch (Idempotent Batch (Noop, Prop-only change, Upsert) #3949) applieddont_revectorize
- skip reindex (Update object properties without reindexing vector #3948) + idempotent batch (Idempotent Batch (Noop, Prop-only change, Upsert) #3949) + dont revectorize ([Modules] Don't revectorize identical/similar objects in a batch #3950) appliedResults for objects with all types of properties and no vector provided (given in h:mm:ss; 2 runs):
propsA no_vec
propsB no_vec
propsB no_vec
propsB no_vec
propsA no_vec
0:00:58.808423
0:00:28.942949
0:00:29.419517
0:00:29.462884
0:00:29.965787
0:00:59.968921
0:00:20.679111
0:00:29.806785
0:00:20.521877
0:00:30.835779
Results for objects with all types of properties and vector provided to show no performance degradation introduced by new feature (given in h:mm:ss; 2 runs):
propsA vecA
propsB vecA
propsB vecA
propsB vecA
propsA vecA
0:00:30.401448
0:00:21.395962
0:00:21.974850
0:00:21.503501
0:00:21.888041
0:00:28.411575
0:00:20.655759
0:00:21.622190
0:00:20.869492
0:00:21.416824
Review checklist