-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
two tikv oom after inject tikv network-loss and recovery for some time #12255
Comments
/type bug |
/remove-severity critical |
/found automation |
/assign 5kbpers |
/remove-severity critical |
@Lily2025: These labels are not set on the issue: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
/remove-severity major |
It appears that the memory growth matches active Raft entry count. After restart, the baseline memory usage (4.5GB) indicates the memory usage of Raft Engine. This happens because during disconnection, log entries cannot be GC-ed by the leader, and the in-memory index inside Raft Engine accumulates indefinitely. No fix at the moment. |
/remove-severity Moderate |
/remove-severity major |
/affects-6.0 |
/label affects-6.0 |
close #12255 Support setting memory limit for raft engine Signed-off-by: tabokie <xy.tao@outlook.com>
close tikv#12255 Support setting memory limit for raft engine Signed-off-by: tabokie <xy.tao@outlook.com> Signed-off-by: 3AceShowHand <jinl1037@hotmail.com>
Bug Report
What version of TiKV are you using?
[2022/03/24 04:13:50.851 +08:00] [INFO] [client.go:376] ["Cluster version information"] [type=pd] [version=6.1.0-nightly] [git_hash=1ac0ad691260dabb61a25f30359e996a968ed857]
[2022/03/24 04:13:50.851 +08:00] [INFO] [client.go:376] ["Cluster version information"] [type=tikv] [version=6.0.0-alpha] [git_hash=869b953e798cabf29872fd17d526a7061437aec2]
[2022/03/24 04:13:50.851 +08:00] [INFO] [client.go:376] ["Cluster version information"] [type=tidb] [version=6.1.0-nightly] [git_hash=b9bacad6dafabf5e2dfafee8e50ac66785e911b6]
What operating system and CPU are you using?
8core、16GB
2tidb、3pd、5tikv(5replicas)
Steps to reproduce
https://tcms.pingcap.net/dashboard/executions/plan/662849
test data:
{{[tpcc] []} {s3://benchmark/tpcc10000 tpcc10000 10000 64 2013,1213,1105,1205,8022,8027,8028,9004,9007,1062} {s3://benchmark/sysbench_64_7000w sysbench_64_7000w 64 70000000 64 2013,1213,1105,1205,8022,8027,8028,9004,9007,1062} {0} {[]} {false }}
1、[2022/03/24 04:13:51.083 +08:00] [INFO] [cmd.go:124] ["Start remote command"] [cmd="go-tpc tpcc run -D tpcc10000 --host tc-tidb.endless-oltp-tps-662849-1-968 -P4000 --warehouses 10000 -T 64 --time 36000m --ignore-error '2013,1213,1105,1205,8022,8027,8028,9004,9007,1062'"] [nodename=benchtoolset]
2、inject fault
[2022/03/24 04:24:51.173 +08:00] [INFO] [chaos.go:86] ["Run chaos"] [name=network-loss] [selectors="[endless-oltp-tps-662849-1-968/tc-tikv-1]"] [experiment="{"Duration":"","Scheduler":null,"Loss":"84","Correlation":"25"}"]
[2022/03/24 04:24:51.175 +08:00] [INFO] [chaos.go:86] ["Run chaos"] [name=network-loss] [selectors="[endless-oltp-tps-662849-1-968/tc-tikv-0]"] [experiment="{"Duration":"","Scheduler":null,"Loss":"84","Correlation":"25"}"]
3、recovery fault
[2022/03/24 05:06:51.203 +08:00] [INFO] [chaos.go:151] ["Clean chaos"] [name=network-loss] [chaosId="ns=endless-oltp-tps-662849-1-968,kind=network-loss,name=network-loss-pdhgfxcy,spec=&k8s.ChaosIdentifier{Namespace:"endless-oltp-tps-662849-1-968", Name:"network-loss-pdhgfxcy", Spec:NetworkLossSpec{Duration: "", Scheduler: , Loss: "84", Correlation: "25"}}"]
[2022/03/24 05:06:51.203 +08:00] [INFO] [chaos.go:151] ["Clean chaos"] [name=network-loss] [chaosId="ns=endless-oltp-tps-662849-1-968,kind=network-loss,name=network-loss-zfevalyq,spec=&k8s.ChaosIdentifier{Namespace:"endless-oltp-tps-662849-1-968", Name:"network-loss-zfevalyq", Spec:NetworkLossSpec{Duration: "", Scheduler: , Loss: "84", Correlation: "25"}}"]
What did you expect?
all tikv are normal
What did happened?
two tikv oom at 2022/03/24 06:02 and 06:26
![image](https://user-images.githubusercontent.com/84712107/159885457-2e80a947-519f-48d1-850d-71725cfa5701.png)
![image](https://user-images.githubusercontent.com/84712107/159885508-3873321a-03bf-4325-b524-a649eff1585b.png)
tikv0 memory start to rise form 2022/03/24 05:10 and oom at 06:26
tikv1 memory start to rise form 2022/03/24 05:08 and oom at 06:00
The text was updated successfully, but these errors were encountered: