Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip re-vectorization of identical/similar objects in a batch #4163

Merged
merged 4 commits into from
Feb 12, 2024

Conversation

aliszka
Copy link
Member

@aliszka aliszka commented Feb 7, 2024

What's being changed:

Addresses #3950

Notes:

  • replaces moduletools.ObjectDiff used for merge/patch with moduletools.VectorizablePropsComparator supporting all types of actions
  • skipping re-vectorization is applied to merge (existed before), update and batch
  • if no vectorizer is configured or vector is provided explicitly comparator is not used
  • re-vectorization is performed only if vectorizable properties are changed comparing to existing object with same uuid
  • properties are considered vectorizable if configured with one of datatype: text, text[], blob. On top of that selected vectorizer configuration is applied pointing out exact props to be vectorized.
  • property not set (missing in object), empty property (set to nil) or empty array property (set to []string{}) are considered as not changed (equal) by vectorizable comparator.

Performance:

Modified ann benchmark script was used to verify performance and behaviour of introduced changes (https://github.com/weaviate/weaviate-chaos-engineering/compare/dont_revectorize_performance).
Tests were run locally (M1 Pro) using single node.
Script measures execution time of each of the following steps:

  1. Initial ingestion - 100k objects with properties of each supported type (propsA; vectorization + insert expected)
  2. Ingestion of objects with original properties (propsA; vectorization skipped, objects and inverted indexes update skipped)
  3. Ingestion of objects with modified properties (propsB; vectorization + full update expected)
  4. Ingestion of objects with modified properties (propsB; vectorization skipped, objects and inverted indexes update skipped)
  5. Ingestion of objects with original properties (propsA; vectorization + full update expected)

Script was tested on 2 different weaviate versions:

Results for objects with all types of properties and no vector provided (given in h:mm:ss; 2 runs):

1st batch (initial)
propsA no_vec
2nd batch
propsB no_vec
3rd batch
propsB no_vec
4th batch
propsB no_vec
5th batch
propsA no_vec
master 0:01:03.854263
0:00:58.808423
0:00:29.685099
0:00:28.942949
0:00:31.326278
0:00:29.419517
0:00:29.878895
0:00:29.462884
0:00:30.205239
0:00:29.965787
don't revectorize 0:01:04.193988
0:00:59.968921
0:00:21.476701
0:00:20.679111
0:00:31.528354
0:00:29.806785
0:00:21.961415
0:00:20.521877
0:00:32.918943
0:00:30.835779

Results for objects with all types of properties and vector provided to show no performance degradation introduced by new feature (given in h:mm:ss; 2 runs):

1st batch (initial)
propsA vecA
2nd batch
propsB vecA
3rd batch
propsB vecA
4th batch
propsB vecA
5th batch
propsA vecA
master 0:00:29.593704
0:00:30.401448
0:00:21.207185
0:00:21.395962
0:00:22.003542
0:00:21.974850
0:00:21.101980
0:00:21.503501
0:00:21.690640
0:00:21.888041
don't revectorize 0:00:29.923715
0:00:28.411575
0:00:20.625914
0:00:20.655759
0:00:21.633747
0:00:21.622190
0:00:20.842872
0:00:20.869492
0:00:21.326264
0:00:21.416824

Review checklist

  • Documentation has been updated, if necessary. Link to changed documentation:
  • Chaos pipeline run or not necessary. Link to pipeline: [TEST] dont revectorize weaviate-chaos-engineering#175
  • All new code is covered by tests where it is reasonable.
  • Performance tests have been run or not necessary.

@aliszka aliszka force-pushed the dont_revectorize branch 2 times, most recently from 0f116ae to da37a29 Compare February 8, 2024 10:31
@aliszka aliszka changed the title Dont revectorize Skips revectorization of identical/similar objects in a batch Feb 8, 2024
@aliszka aliszka changed the title Skips revectorization of identical/similar objects in a batch Skip re-vectorization of identical/similar objects in a batch Feb 8, 2024
@aliszka aliszka marked this pull request as ready for review February 9, 2024 13:37
@aliszka aliszka force-pushed the dont_revectorize branch 2 times, most recently from 1aa66b9 to 2a42aa5 Compare February 9, 2024 14:38
Copy link

sonarcloud bot commented Feb 9, 2024

Quality Gate Failed Quality Gate failed

Failed conditions
51.0% Duplication on New Code (required ≤ 3%)

See analysis details on SonarCloud

@antas-marcin antas-marcin merged commit b7b2e0c into master Feb 12, 2024
31 of 33 checks passed
@antas-marcin antas-marcin deleted the dont_revectorize branch February 12, 2024 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants