New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store heartbeat cannot be consumed, heartbeat storm in big cluster #15184
Labels
affects-6.1
affects-6.5
affects-7.1
severity/major
type/bug
Type: Issue - Confirmed a bug
type/enhancement
Type: Issue - Enhancement
Comments
nolouch
added
type/bug
Type: Issue - Confirmed a bug
severity/minor
affects-6.5
affects-7.1
affects-6.1
severity/major
and removed
severity/minor
labels
Jul 25, 2023
nolouch
changed the title
Store heartbeat cannot be consumed
Store heartbeat cannot be consumed, heartbeat storm in big cluster
Jul 25, 2023
ti-chi-bot bot
added a commit
that referenced
this issue
Jul 28, 2023
…15191) ref tikv/pd#6556, close #15184 The store heartbeat will report periodically, no need to do retires - do not retry the store heartbeat - change `remain_reconnect_count` as `remain_request_count` - fix some metrics Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Jul 28, 2023
ref tikv/pd#6556, close tikv#15184 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Jul 28, 2023
ref tikv/pd#6556, close tikv#15184 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Jul 28, 2023
ref tikv/pd#6556, close tikv#15184 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot bot
pushed a commit
that referenced
this issue
Aug 14, 2023
…15191) (#15231) ref tikv/pd#6556, close #15184 The store heartbeat will report periodically, no need to do retires - do not retry the store heartbeat - change `remain_reconnect_count` as `remain_request_count` - fix some metrics Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: nolouch <nolouch@gmail.com> Co-authored-by: ShuNing <nolouch@gmail.com> Co-authored-by: nolouch <nolouch@gmail.com>
ti-chi-bot bot
pushed a commit
that referenced
this issue
Aug 31, 2023
…15191) (#15232) ref tikv/pd#6556, close #15184 The store heartbeat will report periodically, no need to do retires - do not retry the store heartbeat - change `remain_reconnect_count` as `remain_request_count` - fix some metrics Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io> Signed-off-by: nolouch <nolouch@gmail.com> Co-authored-by: ShuNing <nolouch@gmail.com> Co-authored-by: nolouch <nolouch@gmail.com>
ti-chi-bot bot
added a commit
that referenced
this issue
Nov 24, 2023
…erval and reduce retry times (#15837) ref #15184 - The min-resolved-ts will report periodically, no need to do retires - support dynamic change `min-resolved-ts` report interval Signed-off-by: husharp <jinhao.hu@pingcap.com> Co-authored-by: ti-chi-bot[bot] <108142056+ti-chi-bot[bot]@users.noreply.github.com>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Nov 24, 2023
ref tikv#15184 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Nov 24, 2023
ref tikv#15184 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Nov 24, 2023
ref tikv#15184 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ti-chi-bot
pushed a commit
to ti-chi-bot/tikv
that referenced
this issue
Nov 24, 2023
ref tikv#15184 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
affects-6.1
affects-6.5
affects-7.1
severity/major
type/bug
Type: Issue - Confirmed a bug
type/enhancement
Type: Issue - Enhancement
Development Task
Once PD has a very large pressure, the store heartbeat latency may be high caused by the heavy lock competition. but currently, the retry mechanism is not very reasonable. it will retry 10 times, but every retry round may increase the pressure. on the other hand. the new
store_heartbeat
will produce with 10s intervals, which into a vicious circle.Repreduce
add a fail point
sleep(4s)
in pd server when handle thestore_heartbeat
, run cluster:we can see:
and the logs like:
if there are many stores, this increases the pressure in pd side and may cause OOM issue.
Others
the user have a large cluster, there is 200 store, and we found many goroutine locks in the store heartbeat.and details goroutine will like this:
The text was updated successfully, but these errors were encountered: