Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k_busy_wait exits early on Nordic chips #11626

Closed
pabigot opened this issue Nov 23, 2018 · 4 comments

Comments

Projects
None yet
4 participants
@pabigot
Copy link
Collaborator

commented Nov 23, 2018

A k_busy_wait(20) in driver code to wait for a reset to complete was proceeding in 13 us, causing a hardware failure. While investigating I discovered that for Nordic NRF51 k_busy_wait delays:

  • 12.38 us for an argument of 1, 2, 5, 10, 15, or 20 us
  • 26 to 51 us for an argument of 50 us
  • 54 to 57 us for an argument of 75 us
  • 83 to 86 us for an argument of 100 us

Further, for NRF52840 k_busy_wait delays:

  • 3.9 to 4.58 us for an argument of 1, 2, 5, 10, 15, or 20 us
  • 18 to 31 us for an argument of 50 us
  • 61 us for an argument of 75 us
  • 89 to 91 us for an argument of 100 us

Timings derived from a custom application with a GPIO toggle before and after the delay, duration measured with Saleae Logic Pro-8.

That really short delays take much longer than they should is unfortunate. That longer delays take shorter than they should is simply unacceptable. We must have a reliable usleep functionality to satisfy hardware timing requirements.

I did find issue #6498 which was enlightening but did not provide a solution.

Some improvement might be accomplished by fixing k_busy_wait to account for cycles lost to integer truncation, though a 10% error in a 100 us delay is difficult to understand. However, I'm fixing it another way in a PR to be submitted next.

pabigot added a commit to pabigot/zephyr that referenced this issue Nov 23, 2018

soc: arm: nordic: provide custom busy_wait implementations
Implementation taken from Nordic nrfx version 1.3.1 soc/nrfx_coredep.h,
modified to remove material from other series and to conform to Zephyr
coding standards.

Note that unlike standard k_busy_wait this is susceptible to
longer-than-intended delays due to the impact of interrupt handling.

Fixes zephyrproject-rtos#11626

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

pabigot added a commit to pabigot/zephyr that referenced this issue Nov 23, 2018

soc: arm: nordic: provide custom busy_wait implementations
Implementation taken from Nordic nrfx version 1.3.1 soc/nrfx_coredep.h,
modified to remove material from other series and to conform to Zephyr
coding standards.

Note that unlike standard k_busy_wait this is susceptible to
longer-than-intended delays due to the impact of interrupt handling.

Fixes zephyrproject-rtos#11626

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

pabigot added a commit to pabigot/zephyr that referenced this issue Nov 23, 2018

soc: arm: nordic: provide custom busy_wait implementations
Implementation taken from Nordic nrfx version 1.3.1 soc/nrfx_coredep.h,
modified to remove material from other series and to conform to Zephyr
coding standards.

Note that unlike standard k_busy_wait this is susceptible to
longer-than-intended delays due to the impact of interrupt handling.

Fixes zephyrproject-rtos#11626

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

pabigot added a commit to pabigot/zephyr that referenced this issue Nov 26, 2018

soc: arm: nordic: provide custom busy_wait implementations
Implementation taken from Nordic nrfx version 1.3.1 soc/nrfx_coredep.h,
modified to remove material from other series and to conform to Zephyr
coding standards.

Note that unlike standard k_busy_wait this is susceptible to
longer-than-intended delays due to the impact of interrupt handling.

Fixes zephyrproject-rtos#11626

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

carlescufi added a commit that referenced this issue Nov 26, 2018

soc: arm: nordic: provide custom busy_wait implementations
Implementation taken from Nordic nrfx version 1.3.1 soc/nrfx_coredep.h,
modified to remove material from other series and to conform to Zephyr
coding standards.

Note that unlike standard k_busy_wait this is susceptible to
longer-than-intended delays due to the impact of interrupt handling.

Fixes #11626

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

diegosueiro added a commit to diegosueiro/zephyr that referenced this issue Dec 7, 2018

soc: arm: nordic: provide custom busy_wait implementations
Implementation taken from Nordic nrfx version 1.3.1 soc/nrfx_coredep.h,
modified to remove material from other series and to conform to Zephyr
coding standards.

Note that unlike standard k_busy_wait this is susceptible to
longer-than-intended delays due to the impact of interrupt handling.

Fixes zephyrproject-rtos#11626

Signed-off-by: Peter A. Bigot <pab@pabigot.com>
@pabigot

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 26, 2019

@carlescufi As a consequence of #14991 reverting this change the misbehavior is again present in Zephyr 1.14 and current master. Re-testing on nRF52840 shows 50 us delays complete in 24 to 32 us, 75 us delays in 55 us, and 100 us delays in 86 us tested on 1.14.0.

@pizi-nordic FWIW the alternative timer infrastructure in my fork has no effect on the behavior of k_busy_wait.

@pabigot

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 27, 2019

The diagnosis supporting #14991 is not correct. The error between the system clock (counting 10 ms ticks) and the cycle clock (counting 32 KiHz ticks) is not due to RC vs LFXT variation, but the fact that using CONFIG_SYS_CLOCK_TICKS_PER_SEC=100 results in a system clock 1 s that corresponds to 32700 cycles of a 32768 Hz clock.

If CONFIG_SYS_CLOCK_TICKS_PER_SEC=128 is used for this test, then 1.14 passes when bd24b31 is reverted, restoring the precise k_busy_wait implementation.

I've added #15707 which upstreams my patch to changes the default ticks-per-second on all Nordic platforms, and also restores the original fix for this problem.

@pabigot

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 27, 2019

The diagnosis supporting #14991 is not correct.

I'm going to retract that: it's not so simple.

Here's instrumentation on the test that failed in #14186 when CONFIG_SYS_CLOCK_TICKS=100:

starting test - test_kernel_systick
33670 - 920 = 32750 ticks = 999450 ns

    Assertion failed at ../src/kernel_apis.c:55: test_kernel_systick: diff >= WAIT_TIME_US is false

When CONFIG_SYS_CLOCK_TICKS=128 on the same hardware I get:

starting test - test_kernel_systick
33705 - 919 = 32786 ticks = 1000549 ns
PASS - test_kernel_systick

Both of these taken from the branch used in #15707, with prj.conf updated to override the default, running on a Particle Xenon (nRF52840).

I don't understand why this succeeds and fails the way it does, since in both cases the cycle timer---which should not be affected by the tick truncation---is used to measure the duration of the delay.

What I have observed is two things:

  • Runs of the test immediately after flashing a new image have a shorter measured wait time in cycles than runs initiated by resetting a programmed device. The difference is generally about 20 cycles (from 32718 to 32739), or 610 us. This is consistent with an unstable RC clock starting up over the first couple seconds, assuming something about a reset speeds stabilization.
  • Runs of the test with 100 ticks per second have a shorter measured wait time in cycles than runs with 128 ticks per second (32675 for 100 t/s versus 32718 for 128 t/s). The same first-versus-reset increase applies (32675 to 32696).

The result is that the first run after flashing with 100 t/s almost always fails, while subsequent runs after a reset usually pass. The first run after flashing with 128 t/s usually succeeds, and subsequent runs after a reset almost always succeed.

So there's just something really weird about the way the timing infrastructure works.

@pizi-nordic

This comment has been minimized.

Copy link
Collaborator

commented Apr 29, 2019

The diagnosis supporting #14991 is not correct. The error between the system clock (counting 10 ms ticks) and the cycle clock (counting 32 KiHz ticks) is not due to RC vs LFXT variation, but the fact that using CONFIG_SYS_CLOCK_TICKS_PER_SEC=100 results in a system clock 1 s that corresponds to 32700 cycles of a 32768 Hz clock.

I missed this error source... The fact that on some board test passed and on others not as well as measurement results fixated my mind. Thank you @pabigot for pointing this out!

pabigot added a commit to pabigot/zephyr that referenced this issue May 3, 2019

soc: arm: nordic_nrf: unrevert provide custom busy_wait implementations
This reverts commit bd24b31.

While the test case failure described in zephyrproject-rtos#14186 is associated with the
cycle-based busy-wait implementation, that test is fragile, and fails
less frequently once the incongruence between ticks-per-second and the
32 KiHz RTC clock are resolved.

The gross inaccuracies in the standard busy-wait on Nordic described in

See: zephyrproject-rtos#11626 (comment)

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

pabigot added a commit to pabigot/zephyr that referenced this issue May 3, 2019

soc: arm: nordic_nrf: unrevert provide custom busy_wait implementations
This reverts commit bd24b31.

While the test case failure described in zephyrproject-rtos#14186 is associated with the
cycle-based busy-wait implementation, that test is fragile, and fails
less frequently once the incongruence between ticks-per-second and the
32 KiHz RTC clock are resolved.

The gross inaccuracies in the standard busy-wait on Nordic described in

See: zephyrproject-rtos#11626 (comment)

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

pabigot added a commit to pabigot/zephyr that referenced this issue May 7, 2019

soc: arm: nordic_nrf: unrevert provide custom busy_wait implementations
This reverts commit bd24b31.

While the test case failure described in zephyrproject-rtos#14186 is associated with the
cycle-based busy-wait implementation, that test is fragile, and fails
less frequently once the incongruence between ticks-per-second and the
32 KiHz RTC clock are resolved.  It also assumes that the system clock
is more stable than the infrastructure underlying the the busy-wait
implementation, which is not necessarily true.

The gross inaccuracies in the standard busy-wait on Nordic described in
issue zephyrproject-rtos#11626 justify restoring the custom solution.

As this applies to all Nordic devices, move the setting to the top-level
Kconfig.defconfig.

See: zephyrproject-rtos#11626 (comment)

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

carlescufi added a commit that referenced this issue May 7, 2019

soc: arm: nordic_nrf: unrevert provide custom busy_wait implementations
This reverts commit bd24b31.

While the test case failure described in #14186 is associated with the
cycle-based busy-wait implementation, that test is fragile, and fails
less frequently once the incongruence between ticks-per-second and the
32 KiHz RTC clock are resolved.  It also assumes that the system clock
is more stable than the infrastructure underlying the the busy-wait
implementation, which is not necessarily true.

The gross inaccuracies in the standard busy-wait on Nordic described in
issue #11626 justify restoring the custom solution.

As this applies to all Nordic devices, move the setting to the top-level
Kconfig.defconfig.

See: #11626 (comment)

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

daor-oti added a commit to daor-oti/zephyr that referenced this issue May 10, 2019

soc: arm: nordic_nrf: unrevert provide custom busy_wait implementations
This reverts commit bd24b31.

While the test case failure described in zephyrproject-rtos#14186 is associated with the
cycle-based busy-wait implementation, that test is fragile, and fails
less frequently once the incongruence between ticks-per-second and the
32 KiHz RTC clock are resolved.  It also assumes that the system clock
is more stable than the infrastructure underlying the the busy-wait
implementation, which is not necessarily true.

The gross inaccuracies in the standard busy-wait on Nordic described in
issue zephyrproject-rtos#11626 justify restoring the custom solution.

As this applies to all Nordic devices, move the setting to the top-level
Kconfig.defconfig.

See: zephyrproject-rtos#11626 (comment)

Signed-off-by: Peter A. Bigot <pab@pabigot.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.