Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VDIff: fix performance regression introduced by progress logging #8016

Conversation

rohit-nayak-ps
Copy link
Contributor

Signed-off-by: Rohit Nayak rohit@planetscale.com

Description

VDiff takes a long time to run (from several hours to days). We had added logging to show progress in the vtctld log files to give users an ability to monitor liveness of the vdiff run as well as get an idea of when it might complete. The code added used a logarithmic algo so that it should log some output even for small tables. On profiling it turns out to use 5-10% of the total VDiff CPU utilization. As users start running vreplication workflows on huge tables this can add up.

This PR simpifies this by logging only once every 10 million rows. This is perfectly fine for really long running since it should print the logs every few minutes. VDiffs on small tables anyway finish within a few minutes.

Checklist

  • Tests were added or are not required
  • Documentation was added or is not required

…mplifying progress logging

Signed-off-by: Rohit Nayak <rohit@planetscale.com>
@rohit-nayak-ps rohit-nayak-ps marked this pull request as ready for review May 2, 2021 13:30
@rohit-nayak-ps rohit-nayak-ps requested review from rafael and a team May 2, 2021 13:30
@rohit-nayak-ps rohit-nayak-ps merged commit 97e5963 into vitessio:master May 3, 2021
@rohit-nayak-ps rohit-nayak-ps deleted the rn-vdiff-make-progress-more-efficient branch May 3, 2021 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants