Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

retry short read reqs immediately on server busy #990

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

zyguan
Copy link
Contributor

@zyguan zyguan commented Sep 22, 2023

If a read-only workload running with a short max-execution-time (eg. 500ms), injecting IO delay faults will cause the QPS amost drop to zero, because the queries will be interrupted during server-busy backoff and have no chance to try other available peers.

To address the issue, let the sender retry those requests immediately and do backoff lazily. Here is the test results of injecting IO delay (1s) on one tikv for 3 minutes.
20230922-133941

Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
@zyguan zyguan marked this pull request as ready for review September 25, 2023 08:25
@cfzjywxk
Copy link
Contributor

/cc @crazycs520

@ekexium
Copy link
Contributor

ekexium commented Oct 8, 2023

The change makes backoff more complex. It only takes effect when max execution time is smaller than read timeout, which may itself be considered a misuse, I suppose? I'm not sure if it's worth to cover this case
I see, it's not checking the SQL max_execution_time

return bo.Backoff(args.cfg, args.err)
}
return nil
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain the purpose of these code? I didn't understand it 🤔

Copy link
Contributor Author

@zyguan zyguan Dec 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used to maintain delayed backoffs.

  • "pendingBackoffs[addr] is not nil" means there is a delayed backoff which should be applied on retrying the corresponding store.
  • delayBoTiKVServerBusy is the callback for recording delayed backoffs.
  • backoffOnRetry is used to apply pending backoffs on retry.
  • when there is no candidate, the kv client just return a fake error and let the caller do retry. to avoid potential frequent rpcs, we call backoffOnFakeErr before returning a fake error, which chooses a pending backoff with the largest base duration and applies it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be done by adding "busy" flag to the stores, instead of maintaining maps here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you mean the store struct in region cache, probably no I think. They have different scope/lifetime, that is, each SendReqCtx call should has its own pendingBackoffs, and has no effect to each others. Maybe we can add a field to the replica struct, however, this PR is just a quick-and-dirty fix to the customer issue, I just try to make the least change.

MyonKeminta
MyonKeminta previously approved these changes Feb 28, 2024
Copy link
Contributor

@MyonKeminta MyonKeminta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic looks fine to me. I wonder if the code can be better structured...

@MyonKeminta MyonKeminta dismissed their stale review February 28, 2024 07:42

Wait for @crazycs520 's new solution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants