Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drivers: timer: nrf_rtc_timer: fix handling for 24-bit counter #11747

Merged
merged 1 commit into from Nov 30, 2018

Conversation

Projects
None yet
8 participants
@pabigot
Copy link
Collaborator

commented Nov 29, 2018

Two subtractions failed to account for the possibility that a calculated
time exceeded the counter resolution, allowing a comparison to
improperly indicate that a minimum delay was satisfied.

Use the subtraction helper to avoid the problem.

(The subtraction in z_clock_set_timeout was the cause of issue #11694;
the one in rtc1_nrf5_isr was replaced based on inspection rather than
testing.)

Closes #11694
Fixes #11744

Signed-off-by: Peter A. Bigot pab@pabigot.com

drivers: timer: nrf_rtc_timer: fix handling for 24-bit counter
Two subtractions failed to account for the possibility that a calculated
time exceeded the counter resolution, allowing a comparison to
improperly indicate that a minimum delay was satisfied.

Use the subtraction helper to avoid the problem.

(The subtraction in z_clock_set_timeout was the cause of issue #11694;
the one in rtc1_nrf5_isr was replaced based on inspection rather than
testing.)

Closes #11694

Signed-off-by: Peter A. Bigot <pab@pabigot.com>

@carlescufi carlescufi requested review from pizi-nordic, andyross and pdunaj Nov 29, 2018

@carlescufi

This comment has been minimized.

Copy link
Member

commented Nov 29, 2018

@pabigot thanks! could this also fix #11744?

@pabigot

This comment has been minimized.

Copy link
Collaborator Author

commented Nov 29, 2018

@carlescufi entirely possible.

@mike-scott

This comment has been minimized.

Copy link
Collaborator

commented Nov 29, 2018

I'll test in a moment.

@andyross
Copy link
Collaborator

left a comment

Beautiful, thanks!

Yeah, this was a cut and paste goof on my part. I recognized the need for a special subtraction routine on this particular timer when I wrote the first code, but wasn't rigorous about making sure it was always used as the timer drivers (which are all very similar) evolved. Sorry!

@mike-scott

This comment has been minimized.

Copy link
Collaborator

commented Nov 29, 2018

Tested this commit on top of current master and it does indeed fix the nRF52840 hang at 8 min 20 seconds.

@mike-scott mike-scott self-requested a review Nov 29, 2018

@carlescufi

This comment has been minimized.

Copy link
Member

commented Nov 29, 2018

@pabigot thank you very much for this.

@codecov-io

This comment has been minimized.

Copy link

commented Nov 29, 2018

Codecov Report

Merging #11747 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #11747   +/-   ##
=======================================
  Coverage    48.2%    48.2%           
=======================================
  Files         280      280           
  Lines       43323    43323           
  Branches    10374    10374           
=======================================
  Hits        20886    20886           
  Misses      18289    18289           
  Partials     4148     4148
Impacted Files Coverage Δ
drivers/timer/nrf_rtc_timer.c 94.73% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c4bb10...a4daa8a. Read the comment docs.

@aurel32
Copy link
Collaborator

left a comment

@pabigot thanks for the fix. I confirm it fixes the OpenThread issue I reported in #11694

@pdunaj
Copy link
Collaborator

left a comment

Clarification needed

@@ -63,7 +63,7 @@ void rtc1_nrf5_isr(void *arg)
if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) {
u32_t next = last_count + CYC_PER_TICK;

if ((s32_t)(next - t) < MIN_DELAY) {

This comment has been minimized.

Copy link
@pdunaj

pdunaj Nov 30, 2018

Collaborator

I have a bad feeling about this...

The original code will add CYC_PER_TICK when t is past the next (negative number will be below the MIN_DELAY). So far I had a belief that this was the intention.
The new code will return high unsigned value and CYC_PER_TICK won't be added. If t is past next won't the next tick be triggered with high delay?

If I understand the change in this patch you try to fix overflow that may happen when t wrapped around COUNTER_MAX. Then indeed next will be high compared to t even if it should not be. But in this case shouldn't we check the overflow at the addition (last_count + CYC_PER_TICK)?

This comment has been minimized.

Copy link
@pabigot

pabigot Nov 30, 2018

Author Collaborator

Actually the original code would fail to add CYC_PER_TICK because next - t was a positive value greater than the 24-bit maximum counter value. When it's truncated by set_comparator the effective counter (which produces a very small delay) might get stored too late and the wakeup be missed.

It doesn't matter if any value like last_count + CYC_PER_TICK (or last_count += dticks * CYC_PER_TICK) produces a value greater than 24-bits: the unsigned arithmetic operations "just work" for everything except for duration comparisons where we need to account for potential overflow/underflow. (If the cast had been to s24_t it'd be fine, but that type doesn't exist.)

This comment has been minimized.

Copy link
@pdunaj

pdunaj Nov 30, 2018

Collaborator

@pabigot It can be that I miss something here... Please correct me if I am wrong.
The code calls set_comparator with next (not next - t) so whatever will be trunkated will be trunkated from next.
Now let's see the check. We are subtracting two unsigned values that should be with close proximity to each other. Result (indeed a unsigned) is then converted to integer and will be something around from -n*CYC_PER_TICK to m*CYC_PER_TICK (n,m small positive values as the values t and next are with close proximity to each other).

What the new code will do will be exactly what you described - it will subtract t from next then trunkate. If t was greater it would lead to comparison fail as you will have large positive value. So if t is past next code will not add CYC_PER_TICK to next. If t is past next set_comparator will be set with something that counter already passed.

It can be that I am wrong here but it does not look right.

This comment has been minimized.

Copy link
@pizi-nordic

pizi-nordic Nov 30, 2018

Collaborator

Indeed @pdunaj spotted a silly place in the code. I agree that this should be fixed.

High level explanation for these who are not following discussion:
In some cases (when interrupts was locked for too long) we can setup next RTC interrupt way too far into future (like 8 minutes) instead of the next tick.

This comment has been minimized.

Copy link
@pdunaj

pdunaj Nov 30, 2018

Collaborator

to clarify ... I am not talking about the situation when we are close to the counter max but a situation when t is past next. i.e.

   last_count     next  t
-------------------------------------->time

however I noticed that last_count is updated from t in the irq so next cannot be behind t.

This comment has been minimized.

Copy link
@pdunaj

pdunaj Nov 30, 2018

Collaborator

Code is fine. I see no new issue here.

This comment has been minimized.

Copy link
@pabigot

pabigot Nov 30, 2018

Author Collaborator

@pizi-nordic I think you're concerned about the situation where the system clock ISR has been blocked for a long enough time that it gets serviced late, having "missed" ticks.

@pdunaj yes, exactly, next cannot be "behind" t because it's essentially t with a positive value added to it. So the "missed" ticks are already accounted for in the adjustment to last_count.

It is possible there's something wrong. But I'd want to see the specific situation clearly outlined with example values of last_count, t, CYC_PER_TICK, etc. to show what's going wrong. I still think the code is correct as presented in the PR.

This comment has been minimized.

Copy link
@pdunaj

pdunaj Nov 30, 2018

Collaborator

@pabigot , @pizi-nordic , I overlooked last_count being updated few lines above. Code is fine and can be merged.

Show resolved Hide resolved drivers/timer/nrf_rtc_timer.c

Issue spotted by @pdunaj my introduce worse error than the one we are going to fix.

@pdunaj

pdunaj approved these changes Nov 30, 2018

@@ -63,7 +63,7 @@ void rtc1_nrf5_isr(void *arg)
if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) {
u32_t next = last_count + CYC_PER_TICK;

if ((s32_t)(next - t) < MIN_DELAY) {

This comment has been minimized.

Copy link
@pdunaj

pdunaj Nov 30, 2018

Collaborator

Code is fine. I see no new issue here.

@carlescufi carlescufi merged commit d2f5078 into zephyrproject-rtos:master Nov 30, 2018

7 checks passed

Documentation Verifications passed
Details
Gitlint Verifications passed
Details
Identity/Emails Verifications passed
Details
Kconfig Verifications passed
Details
License Verifications passed
Details
Shippable Run 28369 status is SUCCESS.
Details
checkpatch Verifications passed
Details

@pabigot pabigot deleted the pabigot:issue/11694 branch Nov 30, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.