Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tests : kernel: context test_kernel_cpu_idle fails on nucleo_f091 board #46641

Closed
FRASTM opened this issue Jun 17, 2022 · 12 comments · Fixed by #47112
Closed

tests : kernel: context test_kernel_cpu_idle fails on nucleo_f091 board #46641

FRASTM opened this issue Jun 17, 2022 · 12 comments · Fixed by #47112
Assignees
Labels
area: Kernel area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: low Low impact/importance bug

Comments

@FRASTM
Copy link
Collaborator

FRASTM commented Jun 17, 2022

Describe the bug
When executing the tests/kernel/context on the nucleo_f091rc board, the test_kernel_cpu_idle fails because the interrupt comes sooner than 1 millsecond which is the timer delay to wakeUp.

Please also mention any information which could help others to understand
the problem you're facing:

  • nucleo_f091rc hardware

Similar to #41347 but the associated PR makes the test fails, in other words, the PR #41348 looks like a regression for this testcase on this board

To Reproduce
Steps to reproduce the behavior:

  1. west build -p auto -b nucleo_f091rc tests/kernel/context
  2. west flash
  3. See error

Expected behavior
Test PASSED

Logs and console output

*** Booting Zephyr OS build zephyr-v3.1.0-470-g9aea94e97345  ***                                                
Running TESTSUITE context                                                                                       
===================================================================                                             
START - test_kernel_interrupts                                                                                  
 SKIP - test_kernel_interrupts in 0.1 seconds                                                                   
===================================================================                                             
START - test_kernel_ctx_thread                                                                                  
Testing k_current_get() from an ISR and thread                                                                  
Testing k_is_in_isr() from an ISR                                                                               
Testing k_is_in_isr() from a preemptible thread                                                                 
 PASS - test_kernel_ctx_thread in 0.12 seconds                                                                  
===================================================================                                             
START - test_busy_wait                                                                                          
Thread busy waiting for 20000 usecs                                                                             
Thread busy waiting completed                                                                                   
Thread busy waiting for 20000 usecs (irqs locked)                                                               
Thread busy waiting completed (irqs locked)                                                                     
 PASS - test_busy_wait in 0.54 seconds                                                                          
===================================================================                                             
START - test_k_sleep                                                                                            
...                                                  
 PASS - test_k_sleep in 5.336 seconds                                                                           
===================================================================                                             
START - test_kernel_cpu_idle_atomic                                                                             
 SKIP - test_kernel_cpu_idle_atomic in 0.1 seconds                                                              
===================================================================                                             
START - test_kernel_cpu_idle                                                                                    
                                                                                                                
    Assertion failed at WEST_TOPDIR/zephyr/tests/kernel/context/src/main.c:355: _test_kernel_cpu_idle: (tms2 < )
Bad ms value computed,got 5481 which is less than 5482                                                          
                                                                                                                
 FAIL - test_kernel_cpu_idle in 0.19 seconds   

Environment (please complete the following information):

  • OS: ubuntu
  • Toolchain: zephyr-sdk-0.14.0
  • Commit SHA 92bc3e4

Additional context

  • during the test case execution, the idle_timer is stopped before it expires (idle_timer_expiry_function)
  • It should be noted that f091rc nucleo has a 48MHz clock from PLL sourced by the HSE (bypassed at 8MHz from stlink)
@FRASTM FRASTM added the bug The issue is a bug, or the PR is fixing a bug label Jun 17, 2022
@FRASTM
Copy link
Collaborator Author

FRASTM commented Jun 17, 2022

--> In this config, the test always pass when CONFIG_SYS_CLOCK_TICKS_PER_SEC<10000 (default value) (for example the test is successful with 9999 !)
The idle_timer correctly expires. (5 times calling idle_timer_expiry_function).

Else the idle_timer_expiry_function is called 2 times and test reports a failure.
So that, starting the idle_timer with more than 1 ms will also pass the test :
--> k_timer_start(&idle_timer, K_USEC(1001), K_NO_WAIT);

Or starting the idle_timer with period of 1 ms will also pass the test :
--> k_timer_start(&idle_timer, K_MSEC(1), K_MSEC(1));

In the _test_kernel_cpu_idle function, variable tms, tms2 are declared as int, but k_uptime_get_32() is uint32_t so better declare uint32_t tms, tms2; It does not fix the pb though.

@FRASTM FRASTM added area: Kernel platform: STM32 ST Micro STM32 area: Tests Issues related to a particular existing or missing test labels Jun 17, 2022
@FRASTM
Copy link
Collaborator Author

FRASTM commented Jun 17, 2022

Instead of synchronizing on each loop, the tick sync comes only once after the timer init.

diff --git a/tests/kernel/context/src/main.c b/tests/kernel/context/src/main.c
index cc6559fdea..3af1d7e07c 100644
--- a/tests/kernel/context/src/main.c
+++ b/tests/kernel/context/src/main.c
@@ -330,16 +330,16 @@ static void _test_kernel_cpu_idle(int atomic)
 	/* Set up a time to trigger events to exit idle mode */
 	k_timer_init(&idle_timer, idle_timer_expiry_function, NULL);
 
+	/* Align to ticks before starting the timer.
+	 * (k_timer_start() rounds its duration argument down, not up,
+	 * to a tick boundary)
+	 * This timer operates under the assumption that the interrupt set
+	 * to wake the cpu from idle will be no sooner than 1 millsecond in
+	 * the future. Ensure we are a tick boundary each time, so that the
+	 * system timer does not choose to fire an interrupt sooner.
+	 */
+	k_usleep(1);
 	for (i = 0; i < 5; i++) { /* Repeat the test five times */
-		/* Align to ticks before starting the timer.
-		 * (k_timer_start() rounds its duration argument down, not up,
-		 * to a tick boundary)
-		 * This timer operates under the assumption that the interrupt set
-		 * to wake the cpu from idle will be no sooner than 1 millsecond in
-		 * the future. Ensure we are a tick boundary each time, so that the
-		 * system timer does not choose to fire an interrupt sooner.
-		 */
-		k_usleep(1);
 		k_timer_start(&idle_timer, K_MSEC(1), K_NO_WAIT);
 		tms = k_uptime_get_32();
 		if (atomic) {

@FRASTM
Copy link
Collaborator Author

FRASTM commented Jun 20, 2022

@danieldegrasse could please have a look

@mmahadevan108 mmahadevan108 added the priority: low Low impact/importance bug label Jun 21, 2022
@danieldegrasse
Copy link
Collaborator

@FRASTM based off what you have said here, the issue looks like it might be a rounding problem, since changing the wait period or CONFIG_SYS_CLOCK_TICKS_PER_SEC slightly fixes the test. Could you determine what timeout values are being set in the call to sys_clock_set_timeout for the LPTIM timer driver prior to the issue occurring? For example, I maintain this fork of the GPT driver implementation that implements a basic tracing framework for the GPT driver code, which I use to debug test case failures like this one. Based off what you have said about the call to idle_timer_expiry_function, it is possible the LPTIM driver is announcing progress to the kernel via sys_clock_announce twice in a period where it should do so once.

While #41348 likely did cause a regression in the LPTIM driver, given the documentation for sys_clock_set_timeout, which states that the timer driver may make spurious calls to announce progress to the kernel, I believe the test case changes are valid. My best guess based off of what you have said here is that the LPTIM driver is making multiple calls to sys_clock_announce with a nonzero tick value, causing the kernel to assume more ticks worth of progress have occurred than actually have.

@FRASTM
Copy link
Collaborator Author

FRASTM commented Jun 27, 2022

when running this test on the nucleo_f091rc , there is NO lptim but kernel timer.

The sys_clock_set_timeout with _ticks = 10 is kernel function from drivers/timer/cortex_m_systick.c and is called at the k_timer_start(&idle_timer, K_MSEC(1), K_NO_WAIT);
which corresponds to the 10000 ticks per sec.
Always at the second loop (i=1) the tms2 = 5481 and tms = 5482
The CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC=48000000

@danieldegrasse
Copy link
Collaborator

@FRASTM thanks for the clarification, I was not aware this issue was occurring with the systick driver. I've tried to reproduce this issue on an RT1050 and STM32 433RC evaluation kit, and so far have not had any luck. It may be necessary to add tracing code to the systick timer to narrow down what causes this issue.

Are there other ST platforms with the same value for CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC, and does this test fail there as well? In addition, if the systick frequency is changed from 48MHz to another speed, and CONFIG_SYS_CLOCK_HW_CYCLES_PER_SEC is updated, does this issue still occur?

@FRASTM
Copy link
Collaborator Author

FRASTM commented Jun 28, 2022

When the system clock is reduced to 44MHz (or lower) the test is passed.
Testing on a nucleo_wl55jc where the sysclock is 48Mz, the test is Passed. However during the
z_impl_k_timer_start() function call, the timer->timeout differs

> 	(void)z_abort_timeout(&timer->timeout);
> 	timer->period = period;
> 	timer->status = 0U;
> 
> 	z_add_timeout(&timer->timeout, z_timer_expiration_handler,
> 		     duration);

with nucleo_wl55jc --> timer->timeout = 0 and duration.ticks = 536871272
with nucleo_f091rc --> timer->timeout = 536871560 and duration.ticks = 536871468

@FRASTM
Copy link
Collaborator Author

FRASTM commented Jun 28, 2022

This pb was already reported in the old issue #43663

@FRASTM
Copy link
Collaborator Author

FRASTM commented Jun 28, 2022

One proposal is to change the next_timeout() function of the kernel/timeout.c to be as close to the elapsed time as possible

diff --git a/kernel/timeout.c b/kernel/timeout.c
index d5b19ba0f8..7a424e7683 100644
--- a/kernel/timeout.c
+++ b/kernel/timeout.c
@@ -67,14 +67,13 @@ static int32_t elapsed(void)
 static int32_t next_timeout(void)
 {
 	struct _timeout *to = first();
-	int32_t ticks_elapsed = elapsed();
 	int32_t ret;
 
 	if ((to == NULL) ||
-	    ((int64_t)(to->dticks - ticks_elapsed) > (int64_t)INT_MAX)) {
+	    ((int64_t)(to->dticks - elapsed()) > (int64_t)INT_MAX)) {
 		ret = MAX_WAIT;
 	} else {
-		ret = MAX(0, to->dticks - ticks_elapsed);
+		ret = MAX(0, to->dticks - elapsed());
 	}
 
 #ifdef CONFIG_TIMESLICING

@danieldegrasse
Copy link
Collaborator

One proposal is to change the next_timeout() function of the kernel/timeout.c to be as close to the elapsed time as possible

Does this change fix the issue? At least from the diff you have here, this seems like a logical change

@FRASTM
Copy link
Collaborator Author

FRASTM commented Jun 29, 2022

Yes,
this fixes the pb, PR to come and testing that no regression is coming

andyross added a commit to andyross/zephyr that referenced this issue Jun 30, 2022
This test was written to idle for exactly 1ms and wake up with zero
error, which is just too tight for some platforms (and worked on
emulators where the tick rate is 10x coarser only because 0 == 0!).

And it's not clear that it's testing anything we promise in
documentation, regardless.  Early wakeups are not an error and
absolutely not disallowed, yet the test is treating the wakeup like a
sleep.

Clean it up a bit and relax the tolerance to what we can compute
reliably: do all the math in ticks, idle for 10ms (i.e. longer than a
host quantum for emulators), and allow 1 tick of slop on either side to
permit slightly early wakeups while still verifying that "yes, the idle
did idle".

Fixes zephyrproject-rtos#46641

Signed-off-by: Andy Ross <andyross@google.com>
@andyross
Copy link
Contributor

See #47112 for an alternative that will hopefully fix the issue by fixing some over-tight assumptions in the test

carlescufi pushed a commit that referenced this issue Jul 1, 2022
This test was written to idle for exactly 1ms and wake up with zero
error, which is just too tight for some platforms (and worked on
emulators where the tick rate is 10x coarser only because 0 == 0!).

And it's not clear that it's testing anything we promise in
documentation, regardless.  Early wakeups are not an error and
absolutely not disallowed, yet the test is treating the wakeup like a
sleep.

Clean it up a bit and relax the tolerance to what we can compute
reliably: do all the math in ticks, idle for 10ms (i.e. longer than a
host quantum for emulators), and allow 1 tick of slop on either side to
permit slightly early wakeups while still verifying that "yes, the idle
did idle".

Fixes #46641

Signed-off-by: Andy Ross <andyross@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Kernel area: Tests Issues related to a particular existing or missing test bug The issue is a bug, or the PR is fixing a bug platform: STM32 ST Micro STM32 priority: low Low impact/importance bug
Projects
None yet
4 participants