Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: timer: nrf_rtc_timer: fix handling for 24-bit counter #11747

Merged
merged 1 commit into from Nov 30, 2018

Conversation

@pabigot
Copy link
Collaborator

@pabigot pabigot commented Nov 29, 2018

Two subtractions failed to account for the possibility that a calculated
time exceeded the counter resolution, allowing a comparison to
improperly indicate that a minimum delay was satisfied.

Use the subtraction helper to avoid the problem.

(The subtraction in z_clock_set_timeout was the cause of issue #11694;
the one in rtc1_nrf5_isr was replaced based on inspection rather than
testing.)

Closes #11694
Fixes #11744

Signed-off-by: Peter A. Bigot pab@pabigot.com

Two subtractions failed to account for the possibility that a calculated
time exceeded the counter resolution, allowing a comparison to
improperly indicate that a minimum delay was satisfied.

Use the subtraction helper to avoid the problem.

(The subtraction in z_clock_set_timeout was the cause of issue zephyrproject-rtos#11694;
the one in rtc1_nrf5_isr was replaced based on inspection rather than
testing.)

Closes zephyrproject-rtos#11694

Signed-off-by: Peter A. Bigot <pab@pabigot.com>
@carlescufi
Copy link
Member

@carlescufi carlescufi commented Nov 29, 2018

@pabigot thanks! could this also fix #11744?

@pabigot
Copy link
Collaborator Author

@pabigot pabigot commented Nov 29, 2018

@carlescufi entirely possible.

@mike-scott
Copy link
Contributor

@mike-scott mike-scott commented Nov 29, 2018

I'll test in a moment.

Copy link
Contributor

@andyross andyross left a comment

Beautiful, thanks!

Yeah, this was a cut and paste goof on my part. I recognized the need for a special subtraction routine on this particular timer when I wrote the first code, but wasn't rigorous about making sure it was always used as the timer drivers (which are all very similar) evolved. Sorry!

@mike-scott
Copy link
Contributor

@mike-scott mike-scott commented Nov 29, 2018

Tested this commit on top of current master and it does indeed fix the nRF52840 hang at 8 min 20 seconds.

@mike-scott mike-scott self-requested a review Nov 29, 2018
@carlescufi
Copy link
Member

@carlescufi carlescufi commented Nov 29, 2018

@pabigot thank you very much for this.

@codecov-io
Copy link

@codecov-io codecov-io commented Nov 29, 2018

Codecov Report

Merging #11747 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #11747   +/-   ##
=======================================
  Coverage    48.2%    48.2%           
=======================================
  Files         280      280           
  Lines       43323    43323           
  Branches    10374    10374           
=======================================
  Hits        20886    20886           
  Misses      18289    18289           
  Partials     4148     4148
Impacted Files Coverage Δ
drivers/timer/nrf_rtc_timer.c 94.73% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c4bb10...a4daa8a. Read the comment docs.

Copy link
Collaborator

@aurel32 aurel32 left a comment

@pabigot thanks for the fix. I confirm it fixes the OpenThread issue I reported in #11694

Copy link
Collaborator

@pdunaj pdunaj left a comment

Clarification needed

@@ -63,7 +63,7 @@ void rtc1_nrf5_isr(void *arg)
if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) {
u32_t next = last_count + CYC_PER_TICK;

if ((s32_t)(next - t) < MIN_DELAY) {
Copy link
Collaborator

@pdunaj pdunaj Nov 30, 2018

I have a bad feeling about this...

The original code will add CYC_PER_TICK when t is past the next (negative number will be below the MIN_DELAY). So far I had a belief that this was the intention.
The new code will return high unsigned value and CYC_PER_TICK won't be added. If t is past next won't the next tick be triggered with high delay?

If I understand the change in this patch you try to fix overflow that may happen when t wrapped around COUNTER_MAX. Then indeed next will be high compared to t even if it should not be. But in this case shouldn't we check the overflow at the addition (last_count + CYC_PER_TICK)?

Copy link
Collaborator Author

@pabigot pabigot Nov 30, 2018

Actually the original code would fail to add CYC_PER_TICK because next - t was a positive value greater than the 24-bit maximum counter value. When it's truncated by set_comparator the effective counter (which produces a very small delay) might get stored too late and the wakeup be missed.

It doesn't matter if any value like last_count + CYC_PER_TICK (or last_count += dticks * CYC_PER_TICK) produces a value greater than 24-bits: the unsigned arithmetic operations "just work" for everything except for duration comparisons where we need to account for potential overflow/underflow. (If the cast had been to s24_t it'd be fine, but that type doesn't exist.)

Copy link
Collaborator

@pdunaj pdunaj Nov 30, 2018

@pabigot It can be that I miss something here... Please correct me if I am wrong.
The code calls set_comparator with next (not next - t) so whatever will be trunkated will be trunkated from next.
Now let's see the check. We are subtracting two unsigned values that should be with close proximity to each other. Result (indeed a unsigned) is then converted to integer and will be something around from -n*CYC_PER_TICK to m*CYC_PER_TICK (n,m small positive values as the values t and next are with close proximity to each other).

What the new code will do will be exactly what you described - it will subtract t from next then trunkate. If t was greater it would lead to comparison fail as you will have large positive value. So if t is past next code will not add CYC_PER_TICK to next. If t is past next set_comparator will be set with something that counter already passed.

It can be that I am wrong here but it does not look right.

Copy link
Collaborator

@pizi-nordic pizi-nordic Nov 30, 2018

Indeed @pdunaj spotted a silly place in the code. I agree that this should be fixed.

High level explanation for these who are not following discussion:
In some cases (when interrupts was locked for too long) we can setup next RTC interrupt way too far into future (like 8 minutes) instead of the next tick.

Copy link
Collaborator

@pdunaj pdunaj Nov 30, 2018

to clarify ... I am not talking about the situation when we are close to the counter max but a situation when t is past next. i.e.

   last_count     next  t
-------------------------------------->time

however I noticed that last_count is updated from t in the irq so next cannot be behind t.

Copy link
Collaborator

@pdunaj pdunaj Nov 30, 2018

Code is fine. I see no new issue here.

Copy link
Collaborator Author

@pabigot pabigot Nov 30, 2018

@pizi-nordic I think you're concerned about the situation where the system clock ISR has been blocked for a long enough time that it gets serviced late, having "missed" ticks.

@pdunaj yes, exactly, next cannot be "behind" t because it's essentially t with a positive value added to it. So the "missed" ticks are already accounted for in the adjustment to last_count.

It is possible there's something wrong. But I'd want to see the specific situation clearly outlined with example values of last_count, t, CYC_PER_TICK, etc. to show what's going wrong. I still think the code is correct as presented in the PR.

Copy link
Collaborator

@pdunaj pdunaj Nov 30, 2018

@pabigot , @pizi-nordic , I overlooked last_count being updated few lines above. Code is fine and can be merged.

drivers/timer/nrf_rtc_timer.c Show resolved Hide resolved
@pizi-nordic pizi-nordic dismissed their stale review Nov 30, 2018

Issue spotted by @pdunaj my introduce worse error than the one we are going to fix.

pdunaj
pdunaj approved these changes Nov 30, 2018
@@ -63,7 +63,7 @@ void rtc1_nrf5_isr(void *arg)
if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) {
u32_t next = last_count + CYC_PER_TICK;

if ((s32_t)(next - t) < MIN_DELAY) {
Copy link
Collaborator

@pdunaj pdunaj Nov 30, 2018

Code is fine. I see no new issue here.

@carlescufi carlescufi merged commit d2f5078 into zephyrproject-rtos:master Nov 30, 2018
7 checks passed
@pabigot pabigot deleted the issue/11694 branch Nov 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

8 participants