Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WEAVIATE-62 Remove obsolete hnsw files #1875

Merged
merged 9 commits into from Mar 24, 2022
Merged

Conversation

etiennedi
Copy link
Member

@etiennedi etiennedi commented Mar 23, 2022

WARNING: This is branched off of the #1868 branch. I'd recommend reviewing and merging #1871 first, as it will only then become clear what changes are actually from this branch

WEAVIATE-62

The following situation caused an issue. Imagine there are two commit
logs in chronological order:

Log 1:
- AddLink for Node/Level/Link 0/0/1
- AddLink for Node/Level/Link 0/0/2

Log 2:
- ClearLinks for Node/Level 0/0
- AddLink for Node/Level/Link 0/0/3
- AddLink for Node/Level/Link 0/0/4

It should be quite obvious that after reading all the logs back in the
correct order node 0 will have the connections [3,4] at level zero, as
there was a "clear" event in between.

However, prior to this fix we would have ended up with [1,2,3,4] because
the clear information got lost when condensing each log independently.
This fix makes sure that information that relates to changes from prior
commit logs is kept when condensing a commit log. The actual bug was
when reading (as part of condensing) as we used the clear only to clear
the links (which from the perspective of log 2 were already empty). This
led us to condensing log 2 to just "add 3, add 4", whereas the correctly
condensed log would have been "clear, add 3, add 4". This is the case
now.

To write the tests needed for this fix, it was required to add some
functional options to the commit logger to control the commit log size
thresholds. As of now these are not yet exposed to the user, but this
would also add value.

fixes #1868

Potential follow-up tasks:

1. Expose Commitlog configuration (e.g. thresholds) to user, so they can
   finetune them to their needs
2. Clean up the redundant and partly unused code, and the confusion
   between condensor/condensor2 and deserializer/deserializer2
@etiennedi etiennedi changed the title Remove obsolete hnsw files WEAVIATE-62 Remove obsolete hnsw files Mar 23, 2022
@codecov
Copy link

codecov bot commented Mar 23, 2022

Codecov Report

Merging #1875 (a897e6d) into master (7860f17) will increase coverage by 0.53%.
The diff coverage is 55.49%.

@@            Coverage Diff             @@
##           master    #1875      +/-   ##
==========================================
+ Coverage   66.27%   66.81%   +0.53%     
==========================================
  Files         415      416       +1     
  Lines       31433    31482      +49     
==========================================
+ Hits        20833    21035     +202     
+ Misses       8792     8635     -157     
- Partials     1808     1812       +4     
Flag Coverage Δ
integration 68.52% <60.06%> (+1.38%) ⬆️
unittests 66.81% <55.49%> (+0.53%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
adapters/handlers/rest/configure_api.go 0.00% <0.00%> (ø)
...dapters/repos/db/lsmkv/segment_group_compaction.go 68.21% <ø> (-0.96%) ⬇️
...db/vector/hnsw/commit_logger_functional_options.go 0.00% <0.00%> (ø)
adapters/repos/db/vector/hnsw/debug.go 29.33% <0.00%> (-6.16%) ⬇️
usecases/config/config_handler.go 0.00% <0.00%> (ø)
usecases/config/environment.go 0.00% <0.00%> (ø)
adapters/repos/db/shard_disk_use.go 25.53% <25.53%> (ø)
adapters/repos/db/vector/hnsw/commit_logger.go 58.33% <58.82%> (-0.58%) ⬇️
adapters/repos/db/vector/hnsw/condensor.go 62.90% <61.53%> (+57.46%) ⬆️
adapters/repos/db/vector/hnsw/deserializer.go 66.12% <65.76%> (+44.44%) ⬆️
... and 27 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cfde2d2...a897e6d. Read the comment docs.

This should also fix the flaky test seen on this PR's CI
This contains the fixes for the flaky test in the branch this branch is
based off of.
@antas-marcin antas-marcin self-requested a review March 24, 2022 08:55
@antas-marcin antas-marcin merged commit 0e19d21 into master Mar 24, 2022
@antas-marcin antas-marcin deleted the remove-obsolete-hnsw-files branch March 24, 2022 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants