Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exceeded resource group quota limitation if request tokens exceeded 500ms #8349

Closed
nolouch opened this issue Jul 2, 2024 · 1 comment · Fixed by #8352 or #8368
Closed

Exceeded resource group quota limitation if request tokens exceeded 500ms #8349

nolouch opened this issue Jul 2, 2024 · 1 comment · Fixed by #8352 or #8368
Labels
affects-7.1 affects-7.5 affects-8.1 affects-8.2 severity/major The issue's severity is major. type/bug The issue is confirmed as a bug.

Comments

@nolouch
Copy link
Contributor

nolouch commented Jul 2, 2024

Bug Report

What did you do?

User use resource control.

What did you expect to see?

no error report

What did you see instead?

meet exceed resource group quota limitation, but RU usage is below the RU settings.

image

image

image

Event 1 17:40:19.765 17:40:19.765 Coming a request and finding that the local tokens are insufficient, send a notification to the thread that acquires tokens.
Event 2 17:40:19.765 ~ 17:40:20.198 17:40:20.198 The request keep retrying, but the local tokens haven't refreshed yet, continuously logging the same events as before during this retry period. After retrying 500ms timeout,the report failed error to the applications.
Event 3 17:40:20.263 ~17:40:20.265 17:40:20.263 The thread responsible for fetching Tokens received the notification and started to send requests for tokens.17:40:20.265 Obtain new tokens authorization

See the above table. Theoretically, Event 1 should immediately trigger Event 3. After Event 3 succeeds, then enough tokens are obtained during the retry period of Event 2, the request can continue. However, it is possible that since the current event-driven system is similar to a single-threaded event loop, in some cases, the processing delay of a certain message exceeds 500 ms, leading to a failure in obtaining tokens and resulting in an error.

What version of PD are you using (pd-server -V)?

7.5.2

@nolouch nolouch added the type/bug The issue is confirmed as a bug. label Jul 2, 2024
@github-actions github-actions bot added this to Need Triage in Questions and Bug Reports Jul 2, 2024
@nolouch nolouch added the severity/major The issue's severity is major. label Jul 2, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in #8352 Jul 3, 2024
@ti-chi-bot ti-chi-bot bot closed this as completed in 6b25787 Jul 3, 2024
Questions and Bug Reports automation moved this from Need Triage to Closed Jul 3, 2024
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 4, 2024
close tikv#8349

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this issue Jul 4, 2024
…ucket (#8344) (#8355)

close #8343, ref #8349

client/controller: record context error and add slowlog about token bucket
- record low process start time, and log it if it's too slow
- record the context error

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: Shuning Chen <nolouch@ShuningdeMacBook-Pro.local>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: Shuning Chen <nolouch@ShuningdeMacBook-Pro.local>
@easonn7
Copy link

easonn7 commented Jul 4, 2024

/approve

ti-chi-bot bot added a commit that referenced this issue Jul 4, 2024
…he local bucket (#8352) (#8360)

close #8349

resource_control: allow configuration of the maximum retry time for the local bucket
- Added config `ltb-token-rpc-max-delay`
- Increased default max delay from 500ms to 1s

Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: nolouch <nolouch@gmail.com>
ti-chi-bot bot added a commit that referenced this issue Jul 4, 2024
…he local bucket (#8352) (#8361)

close #8349

resource_control: allow configuration of the maximum retry time for the local bucket
- Added config `ltb-token-rpc-max-delay`
- Increased default max delay from 500ms to 1s

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
Co-authored-by: nolouch <nolouch@gmail.com>
nolouch pushed a commit to nolouch/pd that referenced this issue Jul 4, 2024
…ucket (tikv#8344) (tikv#8355)

close tikv#8343, ref tikv#8349

client/controller: record context error and add slowlog about token bucket
- record low process start time, and log it if it's too slow
- record the context error

Signed-off-by: Shuning Chen <nolouch@ShuningdeMacBook-Pro.local>
nolouch added a commit to nolouch/pd that referenced this issue Jul 4, 2024
close tikv#8349

Signed-off-by: nolouch <nolouch@gmail.com>
nolouch pushed a commit to nolouch/pd that referenced this issue Jul 4, 2024
…ucket (tikv#8344) (tikv#8355)

close tikv#8343, ref tikv#8349

client/controller: record context error and add slowlog about token bucket
- record low process start time, and log it if it's too slow
- record the context error

Signed-off-by: Shuning Chen <nolouch@ShuningdeMacBook-Pro.local>
nolouch added a commit to nolouch/pd that referenced this issue Jul 4, 2024
close tikv#8349

Signed-off-by: nolouch <nolouch@gmail.com>
Signed-off-by: Shuning Chen <nolouch@ShuningdeMacBook-Pro.local>
nolouch added a commit that referenced this issue Jul 4, 2024
…he local bucket (#8352)  (#8365)

* client/controller: record context error and add slowlog about token bucket (#8344) (#8355)

close #8343, ref #8349

client/controller: record context error and add slowlog about token bucket
- record low process start time, and log it if it's too slow
- record the context error

Signed-off-by: Shuning Chen <nolouch@ShuningdeMacBook-Pro.local>

* This is an automated cherry-pick of #8352

close #8349

Signed-off-by: nolouch <nolouch@gmail.com>
Signed-off-by: Shuning Chen <nolouch@ShuningdeMacBook-Pro.local>

---------

Signed-off-by: Shuning Chen <nolouch@ShuningdeMacBook-Pro.local>
Signed-off-by: nolouch <nolouch@gmail.com>
Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 4, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 5, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 5, 2024
ti-chi-bot bot pushed a commit to pingcap/tidb that referenced this issue Jul 5, 2024
ti-chi-bot bot pushed a commit that referenced this issue Jul 8, 2024
close #8349

controller: fix the low_ru request missed 

The problem is that `c.run.currentRequests` is shared by all groups.
If one group triggers a token request that isn't handled by the response, the other group's requests will be discarded.
Here, we do not discard the low_ru triggers.

Signed-off-by: nolouch <nolouch@gmail.com>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 8, 2024
close tikv#8349

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 8, 2024
close tikv#8349

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this issue Jul 8, 2024
close tikv#8349

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot pushed a commit that referenced this issue Jul 8, 2024
close #8349

controller: fix the low_ru request missed 

The problem is that `c.run.currentRequests` is shared by all groups.
If one group triggers a token request that isn't handled by the response, the other group's requests will be discarded.
Here, we do not discard the low_ru triggers.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
ti-chi-bot bot pushed a commit that referenced this issue Jul 8, 2024
close #8349

controller: fix the low_ru request missed 

The problem is that `c.run.currentRequests` is shared by all groups.
If one group triggers a token request that isn't handled by the response, the other group's requests will be discarded.
Here, we do not discard the low_ru triggers.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
ti-chi-bot bot pushed a commit that referenced this issue Jul 8, 2024
close #8349

controller: fix the low_ru request missed 

The problem is that `c.run.currentRequests` is shared by all groups.
If one group triggers a token request that isn't handled by the response, the other group's requests will be discarded.
Here, we do not discard the low_ru triggers.

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Signed-off-by: nolouch <nolouch@gmail.com>

Co-authored-by: ShuNing <nolouch@gmail.com>
Co-authored-by: nolouch <nolouch@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-7.1 affects-7.5 affects-8.1 affects-8.2 severity/major The issue's severity is major. type/bug The issue is confirmed as a bug.
Projects
3 participants