This repository contains demo of unexpected bahaviour of Tantivy's merge policy described in the issue: quickwit-oss/tantivy#2454
The following results were run on a release
profile build with M1 Max / 64GB to index 1000
randomly generated documents:
Run | Commit | Merge policy | Wait for merge threads | Time | Segment counts | compute_merge_candidates |
---|---|---|---|---|---|---|
A | Single | MergeWhenever | No | 245ms |
.fast: 4x .fieldnorm: 4x .idx: 4x .pos: 4x .store: 4x .term: 4x |
Calls: 8x0 args: 4x 1 arg: 3x 2 args: 1x |
B | Single | MergeWhenever | Yes | 488ms |
.fast: 1x .fieldnorm: 1x .idx: 1x .pos: 1x .store: 1x .term: 1x |
Calls: 12x0 args: 6x 1 arg: 4x 2 args: 2x |
C | Single | TargetDocs | No | 377ms |
.fast: 6x .fieldnorm: 5x .idx: 5x .pos: 5x .store: 6x .term: 5x |
Calls: 10x0 args: 6x 1 arg: 4x |
D | Single | TargetDocs | Yes | Inf. loop | ??? |
Calls: >63466x0 args: >31734x 1 arg: >31732x |
E | After every change | MergeWhenever | No | 198s |
.fast: 5x .fieldnorm: 5x .idx: 5x .pos: 5x .store: 5x .term: 5x |
Calls: 5992x0 args: 2282x 1 arg: 2712x 2 args: 998x |
F | After every change | MergeWhenever | Yes | 211s |
.fast: 1x .fieldnorm: 1x .idx: 1x .pos: 1x .store: 1x .term: 1x |
Calls: 5998x0 args: 2273x 1 arg: 2726x 2 args: 999x |
G | After every change | TargetDocs | No | 575s |
.fast: 1004x .fieldnorm: 1003x .idx: 1003x .pos: 1002x .store: 1004x .term: 1003x |
Calls: 14548x0 args: 8274x 1 arg: 6274x |
H | After every change | TargetDocs | Yes | Inf. loop | ??? |
Calls: >62218x0 args: >32109x 1 arg: >30109x |
- Runs
D
andH
didn't actually finish, after 45-50min I have manually terminated them - Both runs
D
andH
share 2 settings - both use theTargetDocs
merge policy and both of them wait for merging threads - When the
TargetDocs
is used, then thecompute_merge_candidates
is never invoked with more then 1 single merge candidate - regardless of other settings (# of commits or waiting for merging threads) - The
TargetDocs
merge policy is slightly computationally/memory heavier then the very simpleMergeWhenever
merge policy
Race condition.
Well...
- When the merge policy is "heavier" above some threshold, then a race condition takes place with some internal Tantivy prodecure
- This race condition somehow causes
compute_merge_candidates
never to be passed more then 1 merge candidate - Waiting for merging threads in combination with this race condition causes the program to be stuck in a (possibly) infinite loop