Skip to content

Conversation

@pfalcon
Copy link
Contributor

@pfalcon pfalcon commented Mar 7, 2019

It's relatively hard to figure out what thread a crash happens in
from the crash dump. E.g, it's usually not immediately possible to
find it out from linker map due to the fact that static symbols are
not there (https://sourceware.org/bugzilla/show_bug.cgi?id=16566).

So, try to do it as easy if possible, by just printing thread name
in a dump, if thread names are enabled at all.

Signed-off-by: Paul Sokolovsky paul.sokolovsky@linaro.org

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 7, 2019

This comes from looking into issues like #13897 - there should be a way to get a crash and immediately know what's wrong and have idea where to look for resolving it.

E.g., with issue in that ticket, I get:

[00:00:00.120,000] <inf> net_config: IPv6 address: fe80::200:5eff:fe00:536f
uart:~$ ***** Stack Check Fail! *****
Current thread ID = 0x004013a0 (net_mgmt)
eax: 0x0040a42c, ebx: 0x00000247, ecx: 0x00000000, edx: 0x004021c4
esi: 0x00000247, edi: 0x00402140, ebp: 0x0040a574, esp: 0x0040a570
eflags: 0x00000007 cs: 0x0008

The issue is that I get that only if I enable additional options. But again, there should be such a way.

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 7, 2019

Risks: well, printing a thread name can lead to a fault itself, if things are really bad in memory. Well, it's controlled by CONFIG_THREAD_NAME, and it's not even the default (but I may imagine later there may be talk about enabling it for a good share of samples for example).

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 7, 2019

Alternatives: I personally would rather prefer to un-static thread data definitions, to be able to look up thread id's in a map file. These 2 things are of course not mutually exclusive, but if I were to choose only one, I'd choose to un-static things to workaround stupid gnu linker. Obviously, I don't propose that. Yet.

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be done for all architectures?

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 7, 2019

Could this be done for all architectures?

Not at once. Let's see if that can be done for one arch. Otherwise, yeah, arm waits for its hero to implement stack trace printing. And I don't even talk about less loved archs where e.g. newlib doesn't work too well.

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 7, 2019

Could this be done for all architectures?

Maybe that code should be factored out to be arch-independent for example?

@codecov-io
Copy link

codecov-io commented Mar 7, 2019

Codecov Report

Merging #14155 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #14155   +/-   ##
=======================================
  Coverage   51.99%   51.99%           
=======================================
  Files         308      308           
  Lines       45572    45572           
  Branches    10553    10553           
=======================================
  Hits        23695    23695           
  Misses      17069    17069           
  Partials     4808     4808

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update bbe6fa0...c7edc49. Read the comment docs.

@pfalcon pfalcon force-pushed the crash-print-thread-name branch from ae63b24 to 2d010ef Compare March 7, 2019 15:43
Copy link
Contributor

@andrewboie andrewboie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patch looks fine to me.

@andrewboie
Copy link
Contributor

recheck

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we select THREAD_NAME in the test suites, @andrewboie ?
Then this would all work for the existing target tests.

@andrewboie
Copy link
Contributor

Could we select THREAD_NAME in the test suites, @andrewboie ?
Then this would all work for the existing target tests.

Yes, I think it's reasonable for CONFIG_TEST to select this.

@ioannisg
Copy link
Member

ioannisg commented Mar 8, 2019

Could we select THREAD_NAME in the test suites, @andrewboie ?
Then this would all work for the existing target tests.

Yes, I think it's reasonable for CONFIG_TEST to select this.

Could @pfalcon do this in the context of this PR?

@pfalcon pfalcon force-pushed the crash-print-thread-name branch from 2d010ef to c7edc49 Compare March 13, 2019 19:58
@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 13, 2019

Could @pfalcon do this in the context of this PR?

@ioannisg, Done.

@pfalcon pfalcon requested a review from galak March 13, 2019 19:59
@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 14, 2019

@andyross: FYI, there's regular flakiness in qemu_x86_64 tests. I had 2 cases in row with #14490 , and this failed too: https://app.shippable.com/github/zephyrproject-rtos/zephyr/runs/37140/5/tests (IIRC, #14490 cases also involved tests/kernel/fifo/fifo_usage).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really storing the thread names, more like it allows storing the names. Still, the user needs to do this, AFAICS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then we would need to backtrack and consider why you asked me to add this unrelated change in this PR. For example if "Still, the user needs to do this", then they can also enable CONFIG_THREAD_NAME themselves - if they need it. And not enable, if they don't need it. While you for some reason asked me to force it upon them unsuspecting users. To make them less unsuspecting, I added a comment for a reason why such option would be force-enabled, but you don't even agree with that reason.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,so with THREAD_NAME, the test developers can simply call k_thread_name_set, to set names to the threads they wish. This patch saves us from the need of having CONFIG_THREAD_NAME=y in the prj.conf of each test.

So there is some value in adding the selection here, IMHO :) I just wanted to stress that selecting THREAD_NAME does not automatically add names to the threads :) It's just a style thing, anyways.

Copy link
Contributor Author

@pfalcon pfalcon Mar 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey,so with THREAD_NAME, the test developers can simply call k_thread_name_set

They should call k_thread_name_set() regardless of CONFIG_THREAD_NAME. Or not call, if they choose so. (E.g. if they want to complicate diagnostics.)

I just wanted to stress that selecting THREAD_NAME does not automatically add names to the threads :)

Yep, but we can't include whole API reference/development primer in one comment. So, the comment emphasizes that for the standard system threads, if crash happens in them, the crash dump will include the thread name (because system threads of course call k_thread_name_set(), as that's the best practice). I'd assume that some 10% of tests would create a new user thread, while most will just use existing main thread and/or sys workq. And the idea behind the original patch was to allow to quickly diagnose stack overflow in one of system threads, because that's what happens regularly.

But again, we can't include all those details, if's, and but's in one comment. Hopefully, it'll give basic info and motivate interested folks to read help for CONFIG_THREAD_NAME.

Copy link
Member

@ioannisg ioannisg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine to me. Left a minor comment regarding the added help text

@pfalcon
Copy link
Contributor Author

pfalcon commented Mar 15, 2019

@galak, this is ready to merge.

@andrewboie andrewboie added this to the v1.15.0 milestone Mar 15, 2019
@andrewboie
Copy link
Contributor

This patch is a useful enhancement, but the tree is locked down for bug fixes and documentation only. I think this can go in once the merge window opens again.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please put the comment the select on top of the help, this looks confusing and the comment seem to be part of the help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please put the comment the select on top of the help

Done.

@pfalcon pfalcon force-pushed the crash-print-thread-name branch from c7edc49 to 78822a6 Compare April 17, 2019 15:08
pfalcon added 2 commits April 24, 2019 10:22
It's relatively hard to figure out what thread a crash happens in
from the crash dump. E.g, it's usually not immediately possible to
find it out from linker map due to the fact that static symbols are
not there (https://sourceware.org/bugzilla/show_bug.cgi?id=16566).

So, try to do it as easy if possible, by just printing thread name
in a dump, if thread names are enabled at all.

Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
By selecting CONFIG_THREAD_NAME. It makes sense to have this enabled
by default for tests, to easy debugging in a case of crash.

Signed-off-by: Paul Sokolovsky <paul.sokolovsky@linaro.org>
@pfalcon pfalcon force-pushed the crash-print-thread-name branch from 78822a6 to 40a1f7e Compare April 24, 2019 07:22
@pfalcon
Copy link
Contributor Author

pfalcon commented Apr 24, 2019

@nashif, @carlescufi, @andrewboie: There was CI timeout on this PR, now cleared. Please merged.

@andrewboie andrewboie merged commit 2a798bb into zephyrproject-rtos:master Apr 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants