-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alg] Reuse ranges of contended merges to speed up concurrent merges #4057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea!
merged metarange_ as source and the original destination as the base! (See | ||
"Correctness" below for why this yields the correct result.) | ||
|
||
The performance gain occurs during the retry: We expect the new source to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To gain performance improvement the assumption here is that the two merges don't change the same ranges, need to make sure that this is the case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right: this will be true for some but not all merges. With our emerging focus on merging as the basic tool for handling multi-object file formats (Spark outputs, probably Iceberg, maybe Delta too), I would expect ranges to separate if there are repeated racing merges -- simply because large multi-objects will split off into their range soon enough :-)
Adding words to that effect.
| A | B | A | B | C | conflict | Conflict with updated dst | | ||
| A | A | B | B | C | C | | | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is a correction issue when using the conflict resolution flag. The example uses one file and the letter represents a different version of it.
Two concurrent merges
-
Source wins in case of conflict
Base A
Source B
Dest C
Res B -
Dest wins in case of conflict
Base A
Source D
Dest C
Res C
The first one finish first so the second one will run with the result metarange - C and the dest wins flag:
Base C
Source C
Dest B
Res B
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think conflict resolution also works here, actually. You just need to interpret the new base and source.
Think of this as doing exactly what you would do during manual conflict resolution after losing a race:
- First you merge destination HEAD into your source. This uses the reverse conflict resolution strategy.
- Now you merge the result back to your destination HEAD.
Our multi-object writing strategy is going to do this, and will benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
merged metarange_ as source and the original destination as the base! (See | ||
"Correctness" below for why this yields the correct result.) | ||
|
||
The performance gain occurs during the retry: We expect the new source to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right: this will be true for some but not all merges. With our emerging focus on merging as the basic tool for handling multi-object file formats (Spark outputs, probably Iceberg, maybe Delta too), I would expect ranges to separate if there are repeated racing merges -- simply because large multi-objects will split off into their range soon enough :-)
Adding words to that effect.
| A | B | A | B | C | conflict | Conflict with updated dst | | ||
| A | A | B | B | C | C | | | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think conflict resolution also works here, actually. You just need to interpret the new base and source.
Think of this as doing exactly what you would do during manual conflict resolution after losing a race:
- First you merge destination HEAD into your source. This uses the reverse conflict resolution strategy.
- Now you merge the result back to your destination HEAD.
@eden-ohana PTAL :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
THANKS!Pulling by force because docs don't run tests 🤷🏽 |
No description provided.