Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What's being changed:
Addresses #3949
Notes:
Shard::putObjectLSM(...)
andShard::determineInsertStatus(...)
methods were modified to determine whether incoming object requires:Full insert if performed if object with same uuid does not exist yet.
Full update is performed if vectors or any of geo property between incoming and existing objects differ.
Partial update is performed if vectors and geo props are not changed, but any of properties or additional properties has changed. In that case only objects bucket and indexes of changed properties are updated.
No action is performed if vectors, additional properties and properties (including geo) are not changed.
[]interface{}
now[]string
,[]bool
, etc are used.Simultaneously bug was identified and fixed: properties of type
DateArray
were casted to[]string
instead of[]time.Time
. As a result date values were unintentionally part of data being vectorized. As of now dates will not be vectorized, which will result in different vector generated for the same object.Performance:
Modified ann benchmark script was used to verify performance of introduced changes in code and behaviour (weaviate/weaviate-chaos-engineering@main...idempotent_batch_performance).
Script measures execution time of each of the following steps:
Script was tested on 3 different weaviate versions:
stable/1.23
(2362e17) - no optimizationsmaster
(b7634fe) - skip reindex (Update object properties without reindexing vector #3948) appliedidempotent_batch
- skip reindex (Update object properties without reindexing vector #3948) + idempotent batch (Idempotent Batch (Noop, Prop-only change, Upsert) #3949) appliedResults for objects with all types of properties (given in h:mm:ss; 2 runs):
propsA geoA vecA
propsA geoA vecA
propsB geoA vecA
propsB geoA vecA
propsA geoA vecA
0:00:29.744101
0:00:43.767819
0:01:08.006057
0:01:28.503192
0:02:07.775083
0:00:28.950004
0:00:43.207250
0:01:04.092511
0:01:33.732612
0:02:06.355330
0:00:29.142219
0:00:20.683051
0:00:21.653536
0:00:20.820579
0:00:21.547770
Results for objects will all types of properties except geo (given in h:mm:ss; 2 runs):
propsA vecA
propsA vecA
propsB vecA
propsB vecA
propsA vecA
0:00:26.159162
0:00:35.443693
0:00:48.457433
0:01:02.787999
0:01:17.352514
0:00:25.515179
0:00:19.670457
0:00:20.516546
0:00:20.267077
0:00:20.311261
0:00:25.487944
0:00:19.187122
0:00:20.595007
0:00:19.392559
0:00:20.280267
Results for objects will all types of properties except geo/object/object[] (given in h:mm:ss; 2 runs):
propsA vecA
propsA vecA
propsB vecA
propsB vecA
propsA vecA
0:00:17.311325
0:00:28.357614
0:00:42.953510
0:00:58.314892
0:01:14.295222
0:00:17.055600
0:00:08.879083
0:00:09.565270
0:00:09.188158
0:00:09.612367
0:00:17.235840
0:00:08.536178
0:00:09.681047
0:00:08.734398
0:00:09.530104
Review checklist