Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests: kernel: poll: timeout with FPU enabled #31472

Closed
ABOSTM opened this issue Jan 21, 2021 · 6 comments · Fixed by #31772
Closed

tests: kernel: poll: timeout with FPU enabled #31472

ABOSTM opened this issue Jan 21, 2021 · 6 comments · Fixed by #31772
Assignees
Labels
area: ARM ARM (32-bit) Architecture area: Kernel area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@ABOSTM
Copy link
Collaborator

ABOSTM commented Jan 21, 2021

Describe the bug
When running tests/kernel/poll on stm32f3_disco, test never ends and twister timeout expires.
This happens specially on stm32f3_disco which has CONFIG_FPU enabled by default.

To Reproduce
Steps to reproduce the behavior:

  1. twister -N --device-testing --hardware-map ../map.yaml -c -p stm32f3_disco -T tests/kernel/poll/
  2. Test never ends and twister timeout expires

Expected behavior
Tests passed

Logs and console output
*** Booting Zephyr OS build zephyr-v2.4.0-3063-g33028963c808 ***

Running test suite poll_api
===================================================================
START - test_poll_no_wait
E: syscall z_vrfy_k_poll failed check: num_events too large
E: syscall z_vrfy_k_poll failed check: e->mode == K_POLL_MODE_NOTIFY_ONLY
PASS - test_poll_no_wait
===================================================================
START - test_poll_wait
PASS - test_poll_wait
===================================================================
START - test_poll_zero_events
PASS - test_poll_zero_events
===================================================================
START - test_poll_cancel_main_low_prio
PASS - test_poll_cancel_main_low_prio
===================================================================
START - test_poll_cancel_main_high_prio
PASS - test_poll_cancel_main_high_prio
===================================================================
START - test_poll_multi
PASS - test_poll_multi
===================================================================
START - test_poll_threadstate

Environment (please complete the following information):

  • OS: Linux
  • Toolchain: Zephyr SDK
  • Commit SHA 3302896
@ABOSTM ABOSTM added bug The issue is a bug, or the PR is fixing a bug area: Kernel area: ARM ARM (32-bit) Architecture area: Tests Issues related to a particular existing or missing test labels Jan 21, 2021
@ABOSTM
Copy link
Collaborator Author

ABOSTM commented Jan 21, 2021

Reproducibility:
Issue is not reproducible on windows (probably due to different toolchain).
It is only reproducible when compiler size optimization is enabled (CONFIG_SIZE_OPTIMIZATIONS)
Issue is specially reproducible on stm32f3_disco because, with stm32373c_eval, they are the only stm32 boards for which FPU is enabled by default.

Analysis:
During the test, there is thread creation in test framework run_test().
Despite C code explicitly requesting creation with K_NO_WAIT as 'delay' parameter (i.e. 0x0),
this parameter, at some point, somehow, get 0xFFFFFFFF value and this explains that test never ends and that twister timeout expires.

Analyzing deeper, and looking at assembly code,
I found that K_NO_WAIT value is stored in d8 FPU register at the beginning of z_ztest_run_test_suite()
and thanks to optimization, it is supposed to remain in d8 for all further subtest (for each call to run_test())
But in fact, when reaching 'test_poll_threadstate' subtest, it appears that d8 has been modifie, resulting in delay=0xFFFFFFFF.

Debugging step by step I could not found which code modified d8,
I guess that it occurs during interrupts, or zephyr thread switch.
And I wonder whether zephyr guarantee preservation of FPU registers (with Lazy stacking for interrupt and during thread switch).
CONFIG_FPU_SHARING seems to do that (enabling FPU Lazy stacking for interrupt)
but this switch is not enabled for stm32f3_disco.
When I enabled CONFIG_FPU_SHARING , test is passed (in condition to increase CONFIG_MAIN_STACK_SIZE).

Questions:
As compiler is able, in any thread, to use FPU registers like general purpose registers (even when no floating point computation is performed),
should FPU_SHARING be enabled automatically (KConfig) when FPU is enabled ? (and maybe under other conditions like multithreadding ... ?)

Note: enabling FPU_SHARING requires to increase some (all ?) stacks in order to avoid stack overflow.
So in such case should we also increase stacks (MAIN_STACK_SIZE, ZTEST_STACKSIZE,...) automatically (KConfig) when CONFIG_FPU_SHARING is enabled ?

@ABOSTM
Copy link
Collaborator Author

ABOSTM commented Jan 21, 2021

^^ @ioannisg @erwango

@ABOSTM
Copy link
Collaborator Author

ABOSTM commented Jan 21, 2021

Enabling CONFIG_FPU_SHARING (and increasing CONFIG_MAIN_STACK_SIZE),
also solves other issues on stm32f3_disco:

  • tests/kernel/semaphore/semaphore
  • tests/kernel/pipe/pipe_api
  • tests/kernel/sched/preempt
  • tests/benchmarks/app_kernel/

@mniestroj
Copy link
Member

mniestroj commented Jan 21, 2021

This might be related to #29590 (based on CONFIG_FPU_SHARING need).

@ioannisg ioannisg self-assigned this Jan 21, 2021
@ioannisg ioannisg added the priority: medium Medium impact/importance bug label Jan 23, 2021
@ioannisg
Copy link
Member

@ABOSTM thanks for the detailed analysis, I am going to take a look.

Enabling FPU_SHARING would be a straightforward fix that comes with a cost of extra ram, code footprint and context-switch overhead, so I'd rather avoid this if possible.

@ioannisg
Copy link
Member

Closing this ticket as it concerns the same issue as #29590

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM ARM (32-bit) Architecture area: Kernel area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants