Use breakpad + symbolic to generate and interpret minidump-format core dumps #4202

siddontang · 2019-02-13T08:44:38Z

Edit: We're going to try to integrate breakpad + symbolic to generate compact "minidumps" (via breakpad), and interpret them offline (via symbolic). Next step is to prototype breakpad and symbolic on a toy project to learn how to use them.

Feature Request

Is your feature request related to a problem? Please describe:

Sometimes TiKV may meet some problems like segment fault and crash directly, but unfortunately, our official deployment through Ansbile doesn't enable core because we worry generating too many core dump files may exhaust disk space.

Although we enable core, the generated core files may be too large and can't be sent through the network and we have to debug it on the users' machine directly(of course, this is not allowed in most of the users' environments).

Describe the feature you'd like:

Mostly we only want to know the panic backtrace. Instead of the core file, we can use minidump or just output the panic backtrace.

Teachability, Documentation, Adoption, Migration Strategy:

For minidump, we can use https://github.com/google/breakpad, in Rust, we may try https://github.com/getsentry/symbolic
Another way is to output backtrace directly, refer to https://github.com/gby/libcrash and https://www.scribd.com/doc/3726406/Crash-N-Burn-Writing-Linux-application-fault-handlers.

/cc @ethercflow

brson · 2019-02-14T02:16:28Z

The op mentions panics, but this is really about hard crashes. That said, if we distributed builds with panic = abort we could treat crashes and panics the same, with minidump.

It seems like using breakpad + symbolic should work pretty fine for tikv.

Stripping debuginfo for breakpad would mostly fix #4107, and help with #4150.

This seems promising.

siddontang · 2019-02-14T02:22:28Z

@brson

Do you think is it fine to let contributors help us? of course, we can mentor them.

brson · 2019-02-16T05:02:38Z

@siddontang yes I think this could be an interesting issue for a contributor. It needs some more description of the next steps though - I don't think the original description is quite enough to get started. Feel free to write more about specifically what we need to do next, or else I'll come back and think about it later.

siddontang · 2019-02-17T06:09:16Z

@brson

I think we can try to use breakpad or symbolic directly, I browse the breakpad source code and find it has already registered a signal handler to generate minidump directly. but we should verify whether it can work in Rust or not.

So I think at first, we should try to use these in Rust, then we can introduce to TiKV and ensure it can work ok in TiKV too.

After that, we can use breakpad to extract the symbol debug info, save to another file (we don't need to include the symbol file in release tar) and reduce the binary size. When the users meet coredump, they can only send us the minidump files, and we can debug it directly in our local computer with the symbol file.

brson · 2019-02-22T21:56:12Z

Thanks @siddontang. It sounds like the next step is to prototype integrating breakpad and symbolic into a toy rust project.

@siddontang do you have time to mentor? I understand if not; we can look for somebody else.

siddontang · 2019-04-09T02:00:17Z

Another thing I find is that the binary size in Linux is too huge, nearly 300MB, I use boaty

bloaty/bloaty bin/tikv-server
     VM SIZE                         FILE SIZE
 --------------                   --------------
   0.0%       0 .debug_info         123Mi  41.4%
   0.0%       0 .debug_loc         67.0Mi  22.5%
   0.0%       0 .debug_str         27.5Mi   9.3%
   0.0%       0 .debug_ranges      22.2Mi   7.5%
  63.6%  17.8Mi .text              17.8Mi   6.0%
   0.0%       0 .debug_line        11.2Mi   3.8%
   0.0%       0 .debug_pubnames    7.56Mi   2.5%
   0.0%       0 .debug_pubtypes    7.32Mi   2.5%
   0.0%       0 .strtab            2.62Mi   0.9%
   8.3%  2.31Mi .data.rel.ro       2.31Mi   0.8%
   7.8%  2.19Mi .rela.dyn          2.19Mi   0.7%
   7.5%  2.11Mi .bss                    0   0.0%
   0.0%       0 .debug_abbrev      1.61Mi   0.5%
   5.0%  1.40Mi .rodata            1.40Mi   0.5%
   5.0%  1.38Mi .eh_frame          1.38Mi   0.5%
   0.0%       0 .symtab             971Ki   0.3%
   1.8%   501Ki .gcc_except_table   501Ki   0.2%
   0.0%       0 .debug_aranges      326Ki   0.1%
   0.8%   220Ki .eh_frame_hdr       220Ki   0.1%
   0.3%  89.6Ki [30 Others]         101Ki   0.0%
   0.0%       0 .debug_macro       94.1Ki   0.0%
 100.0%  28.0Mi TOTAL               297Mi 100.0%

Seem most of the spaces are occupied by debug info, so I use strip --strip-debug tikv-server and the size becomes 30MB.

But this has a problem that we will miss the backtrace, so it is not a good idea to do it. Maybe we can separate the debug info, mostly, we don't need to use it, if we want to use, we can download it directly.

sticnarf · 2019-04-18T10:33:10Z

I'm interested in this issue. Is there any mentor?

Here is my rough idea. To create a minidump, we create a binding of breakpad using bindgen, then initiate the exception handler on startup of tikv. After the tikv-server binary is built, use dump_syms in breakpad to create a symbol file. We save the symbol file and then we can distribute the stripped binary.

I have some concerns.

If we use breakpad to generate a minidump, is it safe to just remove the panic hook or even use panic = abort in release? There can be some logs in the buffer which are not flushed to disk. If it aborts, does it mean we will lose some logs?

I am also afraid that it will increase compile time. It introduces a new C++ library we have to compile. Extracting symbols from binary takes some time too.

The advantage I can see is distributing a smaller binary and creating a smaller dump file. But it seems that it is not helpful for debug mode. But we don't have profile-based dependency support in Cargo. It means compile time for dev builds will increase and we can hardly benefit from it while developing.

siddontang added the type/enhancement Type: Issue - Enhancement label Feb 13, 2019

brson added this to To do in Improve compile times via automation Feb 14, 2019

brson added help wanted Help wanted. Contributions are very welcome! component/build Component: Build, Deployment, etc. difficulty/medium Difficulty: Medium. You need some kind of understanding of several components to work on this component/server Component: Server labels Feb 22, 2019

brson changed the title ~~consider generating minidump or outputting backtrace when receives abort signal~~ Use breakpad + symbolic to generate and interpret minidump-format core dumps Feb 22, 2019

brson mentioned this issue Mar 15, 2019

makefiles: add targets for experimenting with cargo profiles #4324

Closed

sticnarf mentioned this issue Apr 20, 2019

util: Prevent double panic in MustConsumeVec dtor #4548

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use breakpad + symbolic to generate and interpret minidump-format core dumps #4202

Use breakpad + symbolic to generate and interpret minidump-format core dumps #4202

siddontang commented Feb 13, 2019 •

edited by brson

brson commented Feb 14, 2019 •

edited

siddontang commented Feb 14, 2019

brson commented Feb 16, 2019

siddontang commented Feb 17, 2019

brson commented Feb 22, 2019

siddontang commented Apr 9, 2019

sticnarf commented Apr 18, 2019

Use breakpad + symbolic to generate and interpret minidump-format core dumps #4202

Use breakpad + symbolic to generate and interpret minidump-format core dumps #4202

Comments

siddontang commented Feb 13, 2019 • edited by brson

Feature Request

brson commented Feb 14, 2019 • edited

siddontang commented Feb 14, 2019

brson commented Feb 16, 2019

siddontang commented Feb 17, 2019

brson commented Feb 22, 2019

siddontang commented Apr 9, 2019

sticnarf commented Apr 18, 2019

siddontang commented Feb 13, 2019 •

edited by brson

brson commented Feb 14, 2019 •

edited