-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Built-in profiling can cause a segmentation fault #9765
Comments
Probably due to incorrect DWARF info during stack probing. pprof-rs issue: tikv/pprof-rs#57 |
/severity critical |
Maybe we need to add unit tests for this? @YangKeao |
It's hard. It only happens when a signal hits the stack probing part in the prologue of a function :( I think an integration test may be more practical. |
I have prepared an integration test for |
The related PR has been merged rust-lang/rust#83412 14 hours ago, so I can test it today or tomorrow. |
I experienced yet another seg fault, which is in master branch:
|
@breeswish Can you provide some more information? What is your Linux distribution and what is your libgcc version? And do you have an estimated frequency of segfault? |
I experienced this crash under:
The crash may happen when I applied some payloads for several hours with Conprof enabled. Conprof fetches profiles for 10s every 1 minute. The last reproduction causes 5 hours for one TiKV, and 3 hours for another TiKV. @mornyx is currently working on this crash. There will be a detailed material shared to you later this week. Update: There is no crash in CentOS 7.9. Update2: There are some harmless errors when checking dwarf for Ubuntu 18.04 (pass for CentOS 7.9), so I'm still not sure what's wrong with the vdso.so in Ubuntu 18:
|
close #9765 Signed-off-by: YangKeao <yangkeao@chunibyo.icu>
Bug Report
What version of TiKV are you using?
Latest nightly: dc8ce2c
What operating system and CPU are you using?
Linux. It happens on both CentOS 7 (3.10.0-957.27.2.el7.x86_64) and Arch Linux (5.11.2-arch1-1).
Steps to reproduce
Run YCSB workloada on a 3-node TiKV cluster. Meanwhile, get
http://{TiKV_IP}:20180/debug/pprof/profile?seconds=30
.What did you expect?
Get a CPU profiling flamegraph.
What did happened?
A segmentation fault happens. Backtrace of the core:
Frame 5 has many possibilities, it may not be
crossbeam_deque::Stealer<T>::steal_batch_and_pop
.The text was updated successfully, but these errors were encountered: