Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Idempotent batch #4058

Merged
merged 5 commits into from
Feb 6, 2024
Merged

Idempotent batch #4058

merged 5 commits into from
Feb 6, 2024

Conversation

aliszka
Copy link
Member

@aliszka aliszka commented Jan 22, 2024

What's being changed:

Addresses #3949

Notes:

  • Shard::putObjectLSM(...) and Shard::determineInsertStatus(...) methods were modified to determine whether incoming object requires:
    • full insert/update (objects, inverted indexes, vector index)
    • partial update (objects, inverted indexes (only relevant to changed properties))
    • no actions
      Full insert if performed if object with same uuid does not exist yet.
      Full update is performed if vectors or any of geo property between incoming and existing objects differ.
      Partial update is performed if vectors and geo props are not changed, but any of properties or additional properties has changed. In that case only objects bucket and indexes of changed properties are updated.
      No action is performed if vectors, additional properties and properties (including geo) are not changed.
  • types of properties of incoming objects have been improved. Change was applied to array data types: instead of []interface{} now []string, []bool, etc are used.
    Simultaneously bug was identified and fixed: properties of type DateArray were casted to []string instead of []time.Time. As a result date values were unintentionally part of data being vectorized. As of now dates will not be vectorized, which will result in different vector generated for the same object.

Performance:

Modified ann benchmark script was used to verify performance of introduced changes in code and behaviour (weaviate/weaviate-chaos-engineering@main...idempotent_batch_performance).
Script measures execution time of each of the following steps:

  1. Initial ingestion - 100k objects with properties of each supported type (propsA, vectorA; full insert expected)
  2. Ingestion of objects with original properties (propsA, vectorA; no action is expected)
  3. Ingestion of objects with modified properties, except geo (propsB, vectorA; partial update is expected)
  4. Ingestion of objects with modified properties, except geo (propsB, vectorA; no action is expected)
  5. Ingestion of objects with original properties (propsA, vectorA; partial update is expected)

Script was tested on 3 different weaviate versions:

Results for objects with all types of properties (given in h:mm:ss; 2 runs):

1st batch (initial)
propsA geoA vecA
2nd batch
propsA geoA vecA
3rd batch
propsB geoA vecA
4th batch
propsB geoA vecA
5th batch
propsA geoA vecA
stable 0:00:30.180680
0:00:29.744101
0:00:44.098360
0:00:43.767819
0:01:04.596541
0:01:08.006057
0:01:27.781050
0:01:28.503192
0:02:09.266730
0:02:07.775083
master 0:00:30.133406
0:00:28.950004
0:00:43.106158
0:00:43.207250
0:01:04.896243
0:01:04.092511
0:01:27.050309
0:01:33.732612
0:02:11.796772
0:02:06.355330
idempotent batch 0:00:29.745860
0:00:29.142219
0:00:20.802320
0:00:20.683051
0:00:21.682167
0:00:21.653536
0:00:20.579698
0:00:20.820579
0:00:21.328010
0:00:21.547770

Results for objects will all types of properties except geo (given in h:mm:ss; 2 runs):

1st batch (initial)
propsA vecA
2nd batch
propsA vecA
3rd batch
propsB vecA
4th batch
propsB vecA
5th batch
propsA vecA
stable 0:00:29.110165
0:00:26.159162
0:00:36.519708
0:00:35.443693
0:00:50.148659
0:00:48.457433
0:01:04.994465
0:01:02.787999
0:01:21.374194
0:01:17.352514
master 0:00:26.090419
0:00:25.515179
0:00:19.860155
0:00:19.670457
0:00:20.415592
0:00:20.516546
0:00:20.041709
0:00:20.267077
0:00:20.323540
0:00:20.311261
idempotent batch 0:00:28.566477
0:00:25.487944
0:00:19.242562
0:00:19.187122
0:00:20.340286
0:00:20.595007
0:00:19.464919
0:00:19.392559
0:00:20.307666
0:00:20.280267

Results for objects will all types of properties except geo/object/object[] (given in h:mm:ss; 2 runs):

1st batch (initial)
propsA vecA
2nd batch
propsA vecA
3rd batch
propsB vecA
4th batch
propsB vecA
5th batch
propsA vecA
stable 0:00:17.522387
0:00:17.311325
0:00:28.581429
0:00:28.357614
0:00:43.538898
0:00:42.953510
0:00:59.816922
0:00:58.314892
0:01:17.125187
0:01:14.295222
master 0:00:17.556203
0:00:17.055600
0:00:08.914176
0:00:08.879083
0:00:09.508935
0:00:09.565270
0:00:09.087848
0:00:09.188158
0:00:09.426510
0:00:09.612367
idempotent batch 0:00:17.421162
0:00:17.235840
0:00:08.722172
0:00:08.536178
0:00:09.917492
0:00:09.681047
0:00:08.814499
0:00:08.734398
0:00:09.669369
0:00:09.530104

Review checklist

  • Documentation has been updated, if necessary. Link to changed documentation:
  • Chaos pipeline run or not necessary. Link to pipeline: [TEST] idempotent batch weaviate-chaos-engineering#171
  • All new code is covered by tests where it is reasonable.
  • Performance tests have been run or not necessary.

Copy link

sonarcloud bot commented Jan 31, 2024

Quality Gate Failed Quality Gate failed

Failed conditions

27.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@aliszka aliszka marked this pull request as ready for review February 1, 2024 12:48
@aliszka aliszka merged commit c31bff0 into master Feb 6, 2024
34 of 35 checks passed
@aliszka aliszka deleted the idempotent_batch branch February 6, 2024 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants