Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backoff pd api when fails #6556

Closed
hihihuhu opened this issue Jun 5, 2023 · 7 comments
Closed

backoff pd api when fails #6556

hihihuhu opened this issue Jun 5, 2023 · 7 comments
Labels
affects-6.5 affects-7.1 type/enhancement The issue belongs to an enhancement.

Comments

@hihihuhu
Copy link

hihihuhu commented Jun 5, 2023

Enhancement Task

particularly, like pd get member request should have a backoff, otherwise it could overload pd and prevent it recover from some temporary issues
like for v6.5.1, https://github.com/tikv/pd/blob/v6.5.1/client/base_client.go#L306

@hihihuhu hihihuhu added the type/enhancement The issue belongs to an enhancement. label Jun 5, 2023
ti-chi-bot bot added a commit to tikv/tikv that referenced this issue Jun 16, 2023
ref tikv/pd#6556, close #14964

Signed-off-by: Ryan Leung <rleungx@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
tonyxuqqi pushed a commit to tonyxuqqi/tikv that referenced this issue Jun 22, 2023
ref tikv/pd#6556, close tikv#14964

Signed-off-by: Ryan Leung <rleungx@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Signed-off-by: tonyxuqqi <tonyxuqi@outlook.com>
@nolouch
Copy link
Contributor

nolouch commented Jul 4, 2023

This updateMember use a memberUpdateInterval(1 min), I think we can see it as a backoff mechanism.

@nolouch
Copy link
Contributor

nolouch commented Jul 4, 2023

and for tikv's pd client, @rleungx has increased the retry interval. tikv/tikv#14954. if not works, we may need consider backoff and increase the max retry time. cc @rleungx

@hihihuhu
Copy link
Author

@nolouch thanks for the reply, the behavior we observe it there are excessive, a few thousand qps, getMember calls from tidb components to pd when pd leader is already having issue

in tidb, it could also actively triggers CheckLeader in additional to the periodical one https://github.com/tikv/pd/blob/v6.5.1/client/base_client.go#L135C2-L135C2. for exmaple, it looks like when tso allocation fails it would schedule a CheckLeader

i would be great to reduce has some backoff for this particular scenario, because the pd leader is already having issue at that point, any further load could make thing worse

@nolouch
Copy link
Contributor

nolouch commented Jul 24, 2023

Got, On the TiDB side, all requests should already have a backoff mechanism via client-go's backoff but some paths may not be covered. like you said the CheckLeader. We will sort out the calling side and do some optimization.

@nolouch
Copy link
Contributor

nolouch commented Jul 27, 2023

BTW, tikv/tikv#13673, in pd-client v2 , we do not need to retry in the inner of the client, so we no need backoff to reduce the requests. then this problem can be significantly improved.

ti-chi-bot bot added a commit to tikv/tikv that referenced this issue Jul 28, 2023
…15191)

ref tikv/pd#6556, close #15184

The store heartbeat will report periodically, no need to do retires
- do not retry the store heartbeat
- change `remain_reconnect_count` as `remain_request_count`
- fix some metrics

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Jul 28, 2023
ref tikv/pd#6556, close tikv#15184

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Jul 28, 2023
ref tikv/pd#6556, close tikv#15184

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Jul 28, 2023
ref tikv/pd#6556, close tikv#15184

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit to tikv/tikv that referenced this issue Aug 14, 2023
…15191) (#15231)

ref tikv/pd#6556, close #15184

The store heartbeat will report periodically, no need to do retires
- do not retry the store heartbeat
- change `remain_reconnect_count` as `remain_request_count`
- fix some metrics

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
@nolouch
Copy link
Contributor

nolouch commented Aug 23, 2023

TiDB Side

#6978 try to reduce the GetMemeber Request. we can see the preliminary test results:

image

The RPC call was reduced from 3.22k to 170 ops, which is relative to the TiDB numbers and client requests for triaging checkLeader. This reduction could be more significant in larger cluster scenarios.

And more tests are necessary to ensure that no further issues arise.

ti-chi-bot bot added a commit that referenced this issue Aug 24, 2023
close #5739, ref #6556

Signed-off-by: Ryan Leung <rleungx@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@nolouch
Copy link
Contributor

nolouch commented Aug 24, 2023

TiKV Side

tikv/tikv#15429 try to reduce the retries in tikv side(all tidb is no workload), details test can see in PR, the result like:

Before

image

After

image

ti-chi-bot bot pushed a commit that referenced this issue Aug 29, 2023
ref #6556

Signed-off-by: husharp <jinhao.hu@pingcap.com>
ti-chi-bot bot added a commit to tikv/tikv that referenced this issue Aug 30, 2023
ref tikv/pd#6556, close #15428

pc_client: add store-level backoff for the reconnect retries

Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Aug 30, 2023
ref tikv/pd#6556, close tikv#15428

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/tikv that referenced this issue Aug 30, 2023
ref tikv/pd#6556, close tikv#15428

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit to tikv/tikv that referenced this issue Aug 31, 2023
…15191) (#15232)

ref tikv/pd#6556, close #15184

The store heartbeat will report periodically, no need to do retires
- do not retry the store heartbeat
- change `remain_reconnect_count` as `remain_request_count`
- fix some metrics

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
ti-chi-bot bot pushed a commit to tikv/tikv that referenced this issue Aug 31, 2023
ref tikv/pd#6556, close #15428

pc_client: add store-level backoff for the reconnect retries

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
ti-chi-bot bot added a commit to tikv/tikv that referenced this issue Sep 1, 2023
ref tikv/pd#6556, close #15428

pc_client: add store-level backoff for the reconnect retries

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
@nolouch nolouch closed this as completed Sep 12, 2023
mittalrishabh pushed a commit to mittalrishabh/tikv that referenced this issue May 8, 2024
…5471) (tikv#4)

ref tikv/pd#6556, close tikv#15428

pc_client: add store-level backoff for the reconnect retries

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>

Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Co-authored-by: ShuNing <nolouch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.5 affects-7.1 type/enhancement The issue belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

2 participants