New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use breakpad + symbolic to generate and interpret minidump-format core dumps #4202
Comments
The op mentions panics, but this is really about hard crashes. That said, if we distributed builds with panic = abort we could treat crashes and panics the same, with minidump. It seems like using breakpad + symbolic should work pretty fine for tikv. Stripping debuginfo for breakpad would mostly fix #4107, and help with #4150. This seems promising. |
Do you think is it fine to let contributors help us? of course, we can mentor them. |
@siddontang yes I think this could be an interesting issue for a contributor. It needs some more description of the next steps though - I don't think the original description is quite enough to get started. Feel free to write more about specifically what we need to do next, or else I'll come back and think about it later. |
I think we can try to use breakpad or symbolic directly, I browse the breakpad source code and find it has already registered a signal handler to generate minidump directly. but we should verify whether it can work in Rust or not. So I think at first, we should try to use these in Rust, then we can introduce to TiKV and ensure it can work ok in TiKV too. After that, we can use breakpad to extract the symbol debug info, save to another file (we don't need to include the symbol file in release tar) and reduce the binary size. When the users meet coredump, they can only send us the minidump files, and we can debug it directly in our local computer with the symbol file. |
Thanks @siddontang. It sounds like the next step is to prototype integrating breakpad and symbolic into a toy rust project. @siddontang do you have time to mentor? I understand if not; we can look for somebody else. |
Another thing I find is that the binary size in Linux is too huge, nearly 300MB, I use boaty
Seem most of the spaces are occupied by debug info, so I use But this has a problem that we will miss the backtrace, so it is not a good idea to do it. Maybe we can separate the debug info, mostly, we don't need to use it, if we want to use, we can download it directly. |
I'm interested in this issue. Is there any mentor? Here is my rough idea. To create a minidump, we create a binding of breakpad using bindgen, then initiate the exception handler on startup of tikv. After the I have some concerns. If we use breakpad to generate a minidump, is it safe to just remove the panic hook or even use I am also afraid that it will increase compile time. It introduces a new C++ library we have to compile. Extracting symbols from binary takes some time too. The advantage I can see is distributing a smaller binary and creating a smaller dump file. But it seems that it is not helpful for debug mode. But we don't have profile-based dependency support in Cargo. It means compile time for dev builds will increase and we can hardly benefit from it while developing. |
Edit: We're going to try to integrate breakpad + symbolic to generate compact "minidumps" (via breakpad), and interpret them offline (via symbolic). Next step is to prototype breakpad and symbolic on a toy project to learn how to use them.
Feature Request
Is your feature request related to a problem? Please describe:
Sometimes TiKV may meet some problems like segment fault and crash directly, but unfortunately, our official deployment through Ansbile doesn't enable core because we worry generating too many core dump files may exhaust disk space.
Although we enable core, the generated core files may be too large and can't be sent through the network and we have to debug it on the users' machine directly(of course, this is not allowed in most of the users' environments).
Describe the feature you'd like:
Mostly we only want to know the panic backtrace. Instead of the core file, we can use minidump or just output the panic backtrace.
Teachability, Documentation, Adoption, Migration Strategy:
For minidump, we can use https://github.com/google/breakpad, in Rust, we may try https://github.com/getsentry/symbolic
Another way is to output backtrace directly, refer to https://github.com/gby/libcrash and https://www.scribd.com/doc/3726406/Crash-N-Burn-Writing-Linux-application-fault-handlers.
/cc @ethercflow
The text was updated successfully, but these errors were encountered: