Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Fix cortex-m context switch bug with a custom flag for the SVC handler #3305

Closed
wants to merge 1 commit into from

Conversation

bradjc
Copy link
Contributor

@bradjc bradjc commented Oct 24, 2022

Pull Request Overview

This is related to #3109.

This PR adds a flag that tells the SVC handler if we are switching from userspace to the kernel or from the kernel to userspace. The flag is a global variable and called KERNEL_TO_USERSPACE.

The hypothesis underlying this PR is that the issue in #3109 is based on the idea that this check in the SVC handler:

// First check to see which direction we are going in. If the link register
// is something other than 0xfffffff9, then we are coming from an app which
// has called a syscall.
cmp lr, #0xfffffff9
bne 100f // to_kernel
// If we get here, then this is a context switch from the kernel to the
// application. Set thread mode to unprivileged to run the application.

does not work in every situation. Namely, there is some execution where SVC handler runs with LR==0xfffffff9, even though the svc instruction was called by userspace. (Note, 0xfffffff9 is the cortex-m special EXC_RETURN value meaning privileged mode with the main stack). This could be from nested exceptions or tail-chained exceptions, but I don't know for sure.

So, instead of trying to use the LR and the built-in architectural support for what should happen when the SVC handler executes, this PR adds our flag that we set to determine the "direction" of the context switch.

Testing Strategy

using apps on hail

TODO or Help Wanted

This PR isn't necessarily what I think we should merge, but hopefully it helps us make progress on this issue. It might not even fix the issue since we don't have a way to reproduce the issue while making modifications to the kernel. Since this bug relies on a particular execution order with interrupts it is difficult to reproduce. We also don't know if it only happens on the SAM4L for some reason.

Documentation Updated

  • Updated the relevant files in /docs, or no updates are required.

Formatting

  • Ran make prepush.

This flag tells the SVC handler if we are switching from userspace to
the kernel or from the kernel to userspace.
@hudson-ayers
Copy link
Contributor

Based on the recent discussion in slack, I still don't think we have a good enough understanding of the issue in #3109 to be confident that this fix will address it, and I suspect this is not a direction we would choose to take if not needed (one reason being that I think this approach of using a raw global variable and volatile accesses for "inter-thread" communication between the exception handler and the main thread is not technically sound, although we do it with two other variables already)

@jrvanwhy
Copy link
Contributor

I think this approach of using a raw global variable and volatile accesses for "inter-thread" communication between the exception handler and the main thread is not technically sound

Why wouldn't it be sound? All the accesses are happening in a single hardware thread of execution.

@hudson-ayers
Copy link
Contributor

Why wouldn't it be sound? All the accesses are happening in a single hardware thread of execution.

Ok, I actually didn't look at the code changes in this PR (shame on me), in which all accesses are actually made directly in asm!() rather than in Rust. I assumed the approach was instead similar to what we do for APP_HARD_FAULT or SYSCALL_FIRED, which make me uncomfortable because we mix volatile and nonvolatile accesses of these variables (e.g. we have volatile accceses in Rust code, plus direct pointer reads/writes in inline assembly), and I am not sure there is a definitive statement that doing so is safe just because it is the same hardware thread, or because the "nonvolatile" accesses are contained within an inline asm block.

https://internals.rust-lang.org/t/volatile-and-sensitive-memory/3188/65 This comment, for example, suggests that LLVM might make volatile reads/write normal reads/writes when it observes that the address pointed to by those reads/writes is also accessed as a normal pointer. Not sure whether the manipulation in our inline assembly (we take the address of the global, store it in a register, then write to the memory address stored in that register) would be treated as a "normal access" in the sense described by that comment.

@jrvanwhy
Copy link
Contributor

jrvanwhy commented Oct 25, 2022

I am not concerned about mixed volatile and asm!() accesses, but I am no longer sure this is sound. See my explanations below:

About mixed volatile and asm!() accesses

volatile versus non-volatile accesses are a language construct; the distinction does not exist in assembly code. Ultimately, whether those accesses are acceptable is a property of the architecture the code is run on. While I'm not an expert on all details of the ARM architecture, I strongly suspect that interrupts are allowed to access global variables, as they would have almost zero capabilities otherwise. Further, such access must be allowed to be non-atomic, because many single-core ARM systems lack atomic instructions entirely.

The exact meaning of volatile in Rust has not been determined yet. In particular, it's unclear what it means to do a volatile read or write that is larger (or in some cases, smaller) than a single word. However, we are reading/writing a single word, and I think there is consensus on what that means: read_volatile::<usize> lowers to a single assembly read instruction, and write_volatile::<usize> lowers to a single assembly write instruction.

https://internals.rust-lang.org/t/volatile-and-sensitive-memory/3188/65 This comment, for example, suggests that LLVM might make volatile reads/write normal reads/writes when it observes that the address pointed to by those reads/writes is also accessed as a normal pointer. Not sure whether the manipulation in our inline assembly (we take the address of the global, store it in a register, then write to the memory address stored in that register) would be treated as a "normal access" in the sense described by that comment.

Normal (non-volatile, non-asm!()) reads and writes in Rust make all sorts of not-yet-specified aliasing assumptions. This allows the compiler to reorder, duplicate, and remove them. This means that most mixes of "normal" and volatile accesses is UB.

Personally, I would be comfortable with mixing read_volatile, write_volatile, and asm!() accesses to a single usize variable, as long as no "normal" Rust accesses are made to that variable

However... can asm!() access the global?

It's not clear to me that the asm!() block in svc_handler_arm_v7m is allowed to write to KERNEL_TO_USERSPACE. The rules for inline assembly refer to the unsafe code guidelines WG's rules on FFI, but the unsafe code guidelines WG hasn't written those rules yet.

I think it's okay because we don't specify readonly or nonmem, but I'm not sure.

I would be comfortable if we passed in KERNEL_TO_USERSPACE as a sym operand, rather than referring to it by name.

@kupiakos Do you have any thoughts on this?

@hudson-ayers
Copy link
Contributor

Personally, I would be comfortable with mixing read_volatile, write_volatile, and asm!() accesses to a single usize variable, as long as no "normal" Rust accesses are made to that variable

This makes sense to me, I think I have just become a little jaded just from reading variations of "if you are using volatile for anything other than MMIO hardware access you are using the wrong tool" across various rust discussions about static mut and Rust atomics.

I would be comfortable if we passed in KERNEL_TO_USERSPACE as a sym operand, rather than referring to it by name.

asm_sym was just stabilized, so that approach is probably preferable now anyway -- it also saves us from having to specify #[no_mangle] and #[used].

@bradjc
Copy link
Contributor Author

bradjc commented Apr 18, 2023

Closing this as it doesn't necessarily fix the issue, and we shouldn't just scattershot changes like this that might not actually do anything.

@bradjc bradjc closed this Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants