Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

riscv: context switch issue in interrupt handler #72123

Open
zxtxin opened this issue Apr 30, 2024 · 7 comments
Open

riscv: context switch issue in interrupt handler #72123

zxtxin opened this issue Apr 30, 2024 · 7 comments
Assignees
Labels
area: RISCV RISCV Architecture (32-bit & 64-bit) bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@zxtxin
Copy link

zxtxin commented Apr 30, 2024

In the current implementation of riscv _isr_wrapper(), z_riscv_switch will probably be called, then the context will switch to a normal thread, while the mret will not be excecuted to return from the interrupt mode. For MCUs with CLIC interrupt controller such as gd32vf103, mintstatus.mil will not be restored if mret is not executed. As a result, no other interrupts with the same level will be executed.

@zxtxin zxtxin added the bug The issue is a bug, or the PR is fixing a bug label Apr 30, 2024
Copy link

Hi @zxtxin! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

@aescolar aescolar added the area: RISCV RISCV Architecture (32-bit & 64-bit) label Apr 30, 2024
@aescolar aescolar added the priority: medium Medium impact/importance bug label Apr 30, 2024
@zxtxin
Copy link
Author

zxtxin commented May 13, 2024

any update?

@fkokosinski
Copy link
Member

Hi @zxtxin, thanks for submitting this bug report.

In the current implementation of riscv _isr_wrapper(), z_riscv_switch will probably be called, then the context will switch to a normal thread, while the mret will not be excecuted to return from the interrupt mode

I'm not sure I follow. If z_riscv_switch is called (when we jump to no_reschedule), then we still still return to _isr_wrapper, which eventually ends with mret.

Could you perhaps provide a more detailed description using the template, and e.g. include the trace of the interrupt handling routine that you're describing? That would be much appreciated. Thanks

@zxtxin
Copy link
Author

zxtxin commented May 14, 2024

check_reschedule:

#ifdef CONFIG_MULTITHREADING

	/* Get pointer to current thread on this CPU */
	lr a1, ___cpu_t_current_OFFSET(s0)

	/*
	 * Get next thread to schedule with z_get_next_switch_handle().
	 * We pass it a NULL as we didn't save the whole thread context yet.
	 * If no scheduling is necessary then NULL will be returned.
	 */
	addi sp, sp, -16
	sr a1, 0(sp)
	mv a0, zero
	call z_get_next_switch_handle
	lr a1, 0(sp)
	addi sp, sp, 16
	beqz a0, no_reschedule

reschedule:

	/*
	 * Perform context switch:
	 * a0 = new thread
	 * a1 = old thread
	 */
	call z_riscv_switch

z_riscv_thread_start:
might_have_rescheduled:
	/* reload s0 with &_current_cpu as it might have changed or be unset */
	get_current_cpu s0

#endif /* CONFIG_MULTITHREADING */

no_reschedule:

Hi @fkokosinski ,
z_riscv_switch() is called if z_get_next_switch_handle() returns non-zero.
the register sp will directly move to the new thread and the register pc will be changed to the new thread after z_riscv_switch() executed.

If z_riscv_switch is called (when we jump to no_reschedule), then we still still return to _isr_wrapper
when we directly jumo to no_reschedule, z_riscv_switch will not be called.
z_riscv_switch is called when we jump to reschedule.

@fkokosinski
Copy link
Member

and the register pc will be changed to the new thread

Yes, with mret in question setting it with the value of the mepc CSR

z_riscv_switch will probably be called, then the context will switch to a normal thread, while the mret will not be excecuted to return from the interrupt mode

This doesn't seem to be the case, though. In the case of reschedule, stepping with gdb from call z_riscv_switch shows that we do the following:

  1. call z_riscv_switch
  2. ret from z_riscv_switch
  3. encounter the z_riscv_thread_start label
  4. encounter the no_reschedule label
  5. return from the interrupt context with mret

I still can't reproduce a scenario you're describing. Could you please provide more detailed instructions using the standard bug report template that can be found here? Thanks

@zxtxin
Copy link
Author

zxtxin commented May 21, 2024

080012d4 <z_riscv_switch>:
 80012d4:	0415a623          	sw	ra,76(a1)
 80012d8:	c9a0                	sw	s0,80(a1)
 80012da:	c9e4                	sw	s1,84(a1)
 80012dc:	0525ac23          	sw	s2,88(a1)
 80012e0:	0535ae23          	sw	s3,92(a1)
 80012e4:	0745a023          	sw	s4,96(a1)
 80012e8:	0755a223          	sw	s5,100(a1)
 80012ec:	0765a423          	sw	s6,104(a1)
 80012f0:	0775a623          	sw	s7,108(a1)
 80012f4:	0785a823          	sw	s8,112(a1)
 80012f8:	0795aa23          	sw	s9,116(a1)
 80012fc:	07a5ac23          	sw	s10,120(a1)
 8001300:	07b5ae23          	sw	s11,124(a1)
 8001304:	0425a423          	sw	sp,72(a1)
 8001308:	0cb5ac23          	sw	a1,216(a1)
 800130c:	04852103          	lw	sp,72(a0)
 8001310:	04c52083          	lw	ra,76(a0)
 8001314:	4920                	lw	s0,80(a0)
 8001316:	4964                	lw	s1,84(a0)
 8001318:	05852903          	lw	s2,88(a0)
 800131c:	05c52983          	lw	s3,92(a0)
 8001320:	06052a03          	lw	s4,96(a0)
 8001324:	06452a83          	lw	s5,100(a0)
 8001328:	06852b03          	lw	s6,104(a0)
 800132c:	06c52b83          	lw	s7,108(a0)
 8001330:	07052c03          	lw	s8,112(a0)
 8001334:	07452c83          	lw	s9,116(a0)
 8001338:	07852d03          	lw	s10,120(a0)
 800133c:	07c52d83          	lw	s11,124(a0)
 8001340:	8082                	ret

Hi @fkokosinski ,
This is the assembly of z_riscv_switch. reg a0 points to the new thread control block. At 0x8001310, reg ra is updated to the address inside arch_switch() where thread a0 was switched out. So the ret would return to arch_switch instead of z_riscv_thread_start

@zxtxin
Copy link
Author

zxtxin commented May 21, 2024

Here is a way to reproduce this issue:
1.cd to the hello_world sample folder
2.modify the src/main.c as follows

#include <stdio.h>
#include <zephyr/kernel.h>
#include <zephyr/device.h>
int main(void)
{
	while(1)
	{
		printf("Hello World! %s\n", CONFIG_BOARD);
		k_sleep(K_MSEC(1000));
	}

	return 0;
}

3.build the image with the following command:
west build -p auto -b gd32vf103c_starter
4.set a breakpoint at printf(...)
5.start debuging
6.at the first time encountering the breakpoint, mintstatus.mil is 0.
7.countinue the program.
8.at the second time encountering the breakpoint, mintstatus.mil is non-zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: RISCV RISCV Architecture (32-bit & 64-bit) bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

No branches or pull requests

5 participants