VDIff: fix performance regression introduced by progress logging #8016
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Signed-off-by: Rohit Nayak rohit@planetscale.com
Description
VDiff takes a long time to run (from several hours to days). We had added logging to show progress in the vtctld log files to give users an ability to monitor liveness of the vdiff run as well as get an idea of when it might complete. The code added used a logarithmic algo so that it should log some output even for small tables. On profiling it turns out to use 5-10% of the total VDiff CPU utilization. As users start running vreplication workflows on huge tables this can add up.
This PR simpifies this by logging only once every 10 million rows. This is perfectly fine for really long running since it should print the logs every few minutes. VDiffs on small tables anyway finish within a few minutes.
Checklist