RFC: Fix cortex-m context switch bug with a custom flag for the SVC handler #3305

bradjc · 2022-10-24T18:02:06Z

Pull Request Overview

This is related to #3109.

This PR adds a flag that tells the SVC handler if we are switching from userspace to the kernel or from the kernel to userspace. The flag is a global variable and called KERNEL_TO_USERSPACE.

The hypothesis underlying this PR is that the issue in #3109 is based on the idea that this check in the SVC handler:

tock/arch/cortex-m/src/lib.rs

Lines 162 to 169 in 47664d8

    
               // First check to see which direction we are going in. If the link register 
        
               // is something other than 0xfffffff9, then we are coming from an app which 
        
               // has called a syscall. 
        
               cmp lr, #0xfffffff9 
        
               bne 100f // to_kernel 
        
               // If we get here, then this is a context switch from the kernel to the 
        
               // application. Set thread mode to unprivileged to run the application.

does not work in every situation. Namely, there is some execution where SVC handler runs with LR==0xfffffff9, even though the svc instruction was called by userspace. (Note, 0xfffffff9 is the cortex-m special EXC_RETURN value meaning privileged mode with the main stack). This could be from nested exceptions or tail-chained exceptions, but I don't know for sure.

So, instead of trying to use the LR and the built-in architectural support for what should happen when the SVC handler executes, this PR adds our flag that we set to determine the "direction" of the context switch.

Testing Strategy

using apps on hail

TODO or Help Wanted

This PR isn't necessarily what I think we should merge, but hopefully it helps us make progress on this issue. It might not even fix the issue since we don't have a way to reproduce the issue while making modifications to the kernel. Since this bug relies on a particular execution order with interrupts it is difficult to reproduce. We also don't know if it only happens on the SAM4L for some reason.

Documentation Updated

Updated the relevant files in /docs, or no updates are required.

Formatting

Ran make prepush.

This flag tells the SVC handler if we are switching from userspace to the kernel or from the kernel to userspace.

hudson-ayers · 2022-10-25T16:34:03Z

Based on the recent discussion in slack, I still don't think we have a good enough understanding of the issue in #3109 to be confident that this fix will address it, and I suspect this is not a direction we would choose to take if not needed (one reason being that I think this approach of using a raw global variable and volatile accesses for "inter-thread" communication between the exception handler and the main thread is not technically sound, although we do it with two other variables already)

jrvanwhy · 2022-10-25T17:01:12Z

I think this approach of using a raw global variable and volatile accesses for "inter-thread" communication between the exception handler and the main thread is not technically sound

Why wouldn't it be sound? All the accesses are happening in a single hardware thread of execution.

hudson-ayers · 2022-10-25T18:17:26Z

Why wouldn't it be sound? All the accesses are happening in a single hardware thread of execution.

Ok, I actually didn't look at the code changes in this PR (shame on me), in which all accesses are actually made directly in asm!() rather than in Rust. I assumed the approach was instead similar to what we do for APP_HARD_FAULT or SYSCALL_FIRED, which make me uncomfortable because we mix volatile and nonvolatile accesses of these variables (e.g. we have volatile accceses in Rust code, plus direct pointer reads/writes in inline assembly), and I am not sure there is a definitive statement that doing so is safe just because it is the same hardware thread, or because the "nonvolatile" accesses are contained within an inline asm block.

https://internals.rust-lang.org/t/volatile-and-sensitive-memory/3188/65 This comment, for example, suggests that LLVM might make volatile reads/write normal reads/writes when it observes that the address pointed to by those reads/writes is also accessed as a normal pointer. Not sure whether the manipulation in our inline assembly (we take the address of the global, store it in a register, then write to the memory address stored in that register) would be treated as a "normal access" in the sense described by that comment.

jrvanwhy · 2022-10-25T19:00:30Z

I am not concerned about mixed volatile and asm!() accesses, but I am no longer sure this is sound. See my explanations below:

About mixed `volatile` and `asm!()` accesses

volatile versus non-volatile accesses are a language construct; the distinction does not exist in assembly code. Ultimately, whether those accesses are acceptable is a property of the architecture the code is run on. While I'm not an expert on all details of the ARM architecture, I strongly suspect that interrupts are allowed to access global variables, as they would have almost zero capabilities otherwise. Further, such access must be allowed to be non-atomic, because many single-core ARM systems lack atomic instructions entirely.

The exact meaning of volatile in Rust has not been determined yet. In particular, it's unclear what it means to do a volatile read or write that is larger (or in some cases, smaller) than a single word. However, we are reading/writing a single word, and I think there is consensus on what that means: read_volatile::<usize> lowers to a single assembly read instruction, and write_volatile::<usize> lowers to a single assembly write instruction.

https://internals.rust-lang.org/t/volatile-and-sensitive-memory/3188/65 This comment, for example, suggests that LLVM might make volatile reads/write normal reads/writes when it observes that the address pointed to by those reads/writes is also accessed as a normal pointer. Not sure whether the manipulation in our inline assembly (we take the address of the global, store it in a register, then write to the memory address stored in that register) would be treated as a "normal access" in the sense described by that comment.

Normal (non-volatile, non-asm!()) reads and writes in Rust make all sorts of not-yet-specified aliasing assumptions. This allows the compiler to reorder, duplicate, and remove them. This means that most mixes of "normal" and volatile accesses is UB.

Personally, I would be comfortable with mixing read_volatile, write_volatile, and asm!() accesses to a single usize variable, as long as no "normal" Rust accesses are made to that variable

However... can `asm!()` access the global?

It's not clear to me that the asm!() block in svc_handler_arm_v7m is allowed to write to KERNEL_TO_USERSPACE. The rules for inline assembly refer to the unsafe code guidelines WG's rules on FFI, but the unsafe code guidelines WG hasn't written those rules yet.

I think it's okay because we don't specify readonly or nonmem, but I'm not sure.

I would be comfortable if we passed in KERNEL_TO_USERSPACE as a sym operand, rather than referring to it by name.

@kupiakos Do you have any thoughts on this?

hudson-ayers · 2022-10-25T19:30:42Z

Personally, I would be comfortable with mixing read_volatile, write_volatile, and asm!() accesses to a single usize variable, as long as no "normal" Rust accesses are made to that variable

This makes sense to me, I think I have just become a little jaded just from reading variations of "if you are using volatile for anything other than MMIO hardware access you are using the wrong tool" across various rust discussions about static mut and Rust atomics.

I would be comfortable if we passed in KERNEL_TO_USERSPACE as a sym operand, rather than referring to it by name.

asm_sym was just stabilized, so that approach is probably preferable now anyway -- it also saves us from having to specify #[no_mangle] and #[used].

bradjc · 2023-04-18T21:17:38Z

Closing this as it doesn't necessarily fix the issue, and we shouldn't just scattershot changes like this that might not actually do anything.

cortex-m: use custom flag for SVC handler

2a39caf

This flag tells the SVC handler if we are switching from userspace to the kernel or from the kernel to userspace.

bradjc mentioned this pull request Oct 25, 2022

Curious Timing Causes Failed Syscall on Cortex-M4 MCU #3109

Open

bradjc closed this Apr 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Fix cortex-m context switch bug with a custom flag for the SVC handler #3305

RFC: Fix cortex-m context switch bug with a custom flag for the SVC handler #3305

bradjc commented Oct 24, 2022

hudson-ayers commented Oct 25, 2022

jrvanwhy commented Oct 25, 2022

hudson-ayers commented Oct 25, 2022

jrvanwhy commented Oct 25, 2022 •

edited

hudson-ayers commented Oct 25, 2022

bradjc commented Apr 18, 2023

	// First check to see which direction we are going in. If the link register
	// is something other than 0xfffffff9, then we are coming from an app which
	// has called a syscall.
	cmp lr, #0xfffffff9
	bne 100f // to_kernel

	// If we get here, then this is a context switch from the kernel to the
	// application. Set thread mode to unprivileged to run the application.

RFC: Fix cortex-m context switch bug with a custom flag for the SVC handler #3305

RFC: Fix cortex-m context switch bug with a custom flag for the SVC handler #3305

Conversation

bradjc commented Oct 24, 2022

Pull Request Overview

Testing Strategy

TODO or Help Wanted

Documentation Updated

Formatting

hudson-ayers commented Oct 25, 2022

jrvanwhy commented Oct 25, 2022

hudson-ayers commented Oct 25, 2022

jrvanwhy commented Oct 25, 2022 • edited

About mixed volatile and asm!() accesses

However... can asm!() access the global?

hudson-ayers commented Oct 25, 2022

bradjc commented Apr 18, 2023

jrvanwhy commented Oct 25, 2022 •

edited

About mixed `volatile` and `asm!()` accesses

However... can `asm!()` access the global?