drivers: timer: nrf_rtc_timer: fix handling for 24-bit counter#11747
drivers: timer: nrf_rtc_timer: fix handling for 24-bit counter#11747carlescufi merged 1 commit intozephyrproject-rtos:masterfrom
Conversation
Two subtractions failed to account for the possibility that a calculated time exceeded the counter resolution, allowing a comparison to improperly indicate that a minimum delay was satisfied. Use the subtraction helper to avoid the problem. (The subtraction in z_clock_set_timeout was the cause of issue zephyrproject-rtos#11694; the one in rtc1_nrf5_isr was replaced based on inspection rather than testing.) Closes zephyrproject-rtos#11694 Signed-off-by: Peter A. Bigot <pab@pabigot.com>
|
@carlescufi entirely possible. |
|
I'll test in a moment. |
andyross
left a comment
There was a problem hiding this comment.
Beautiful, thanks!
Yeah, this was a cut and paste goof on my part. I recognized the need for a special subtraction routine on this particular timer when I wrote the first code, but wasn't rigorous about making sure it was always used as the timer drivers (which are all very similar) evolved. Sorry!
|
Tested this commit on top of current master and it does indeed fix the nRF52840 hang at 8 min 20 seconds. |
|
@pabigot thank you very much for this. |
Codecov Report
@@ Coverage Diff @@
## master #11747 +/- ##
=======================================
Coverage 48.2% 48.2%
=======================================
Files 280 280
Lines 43323 43323
Branches 10374 10374
=======================================
Hits 20886 20886
Misses 18289 18289
Partials 4148 4148
Continue to review full report at Codecov.
|
| if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) { | ||
| u32_t next = last_count + CYC_PER_TICK; | ||
|
|
||
| if ((s32_t)(next - t) < MIN_DELAY) { |
There was a problem hiding this comment.
I have a bad feeling about this...
The original code will add CYC_PER_TICK when t is past the next (negative number will be below the MIN_DELAY). So far I had a belief that this was the intention.
The new code will return high unsigned value and CYC_PER_TICK won't be added. If t is past next won't the next tick be triggered with high delay?
If I understand the change in this patch you try to fix overflow that may happen when t wrapped around COUNTER_MAX. Then indeed next will be high compared to t even if it should not be. But in this case shouldn't we check the overflow at the addition (last_count + CYC_PER_TICK)?
There was a problem hiding this comment.
Actually the original code would fail to add CYC_PER_TICK because next - t was a positive value greater than the 24-bit maximum counter value. When it's truncated by set_comparator the effective counter (which produces a very small delay) might get stored too late and the wakeup be missed.
It doesn't matter if any value like last_count + CYC_PER_TICK (or last_count += dticks * CYC_PER_TICK) produces a value greater than 24-bits: the unsigned arithmetic operations "just work" for everything except for duration comparisons where we need to account for potential overflow/underflow. (If the cast had been to s24_t it'd be fine, but that type doesn't exist.)
There was a problem hiding this comment.
@pabigot It can be that I miss something here... Please correct me if I am wrong.
The code calls set_comparator with next (not next - t) so whatever will be trunkated will be trunkated from next.
Now let's see the check. We are subtracting two unsigned values that should be with close proximity to each other. Result (indeed a unsigned) is then converted to integer and will be something around from -n*CYC_PER_TICK to m*CYC_PER_TICK (n,m small positive values as the values t and next are with close proximity to each other).
What the new code will do will be exactly what you described - it will subtract t from next then trunkate. If t was greater it would lead to comparison fail as you will have large positive value. So if t is past next code will not add CYC_PER_TICK to next. If t is past next set_comparator will be set with something that counter already passed.
It can be that I am wrong here but it does not look right.
There was a problem hiding this comment.
Indeed @pdunaj spotted a silly place in the code. I agree that this should be fixed.
High level explanation for these who are not following discussion:
In some cases (when interrupts was locked for too long) we can setup next RTC interrupt way too far into future (like 8 minutes) instead of the next tick.
There was a problem hiding this comment.
to clarify ... I am not talking about the situation when we are close to the counter max but a situation when t is past next. i.e.
last_count next t
-------------------------------------->time
however I noticed that last_count is updated from t in the irq so next cannot be behind t.
There was a problem hiding this comment.
Code is fine. I see no new issue here.
There was a problem hiding this comment.
@pizi-nordic I think you're concerned about the situation where the system clock ISR has been blocked for a long enough time that it gets serviced late, having "missed" ticks.
@pdunaj yes, exactly, next cannot be "behind" t because it's essentially t with a positive value added to it. So the "missed" ticks are already accounted for in the adjustment to last_count.
It is possible there's something wrong. But I'd want to see the specific situation clearly outlined with example values of last_count, t, CYC_PER_TICK, etc. to show what's going wrong. I still think the code is correct as presented in the PR.
There was a problem hiding this comment.
@pabigot , @pizi-nordic , I overlooked last_count being updated few lines above. Code is fine and can be merged.
Issue spotted by @pdunaj my introduce worse error than the one we are going to fix.
| if (!IS_ENABLED(CONFIG_TICKLESS_KERNEL)) { | ||
| u32_t next = last_count + CYC_PER_TICK; | ||
|
|
||
| if ((s32_t)(next - t) < MIN_DELAY) { |
There was a problem hiding this comment.
Code is fine. I see no new issue here.
Two subtractions failed to account for the possibility that a calculated
time exceeded the counter resolution, allowing a comparison to
improperly indicate that a minimum delay was satisfied.
Use the subtraction helper to avoid the problem.
(The subtraction in z_clock_set_timeout was the cause of issue #11694;
the one in rtc1_nrf5_isr was replaced based on inspection rather than
testing.)
Closes #11694
Fixes #11744
Signed-off-by: Peter A. Bigot pab@pabigot.com