New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xtensa: add MPU support #67938
xtensa: add MPU support #67938
Conversation
76e71a8
to
f66b5a5
Compare
@@ -255,6 +292,7 @@ xtensa_userspace_enter: | |||
l32i a6, a1, 24 | |||
call4 xtensa_swap_update_page_tables | |||
|
|||
#if XCHAL_HAVE_THREADPTR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't CONFIG_THREAD_LOCAL_STORAGE
checking for this condition ? I mean, there is no TLS without THREADPTR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is needed as the #else
part inside #ifdef CONFIG_THREAD_LOCAL_STORAGE
still needs THREADPTR
.
" syscall\n" | ||
" mov %0, a2\n" | ||
: "=r"(ret) : : "a2"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Starting to think that we should always use syscall
for this. It would simplify a lot the code having only one way to do this. With this change we have three different ways:
- Using
THREADPTR
directly - Using TLS
- With syscall
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The obvious downside is speed. So I guess we can have syscall and TLS, and removing THREADPTR entirely. Though we can do it in future commits if decided to do so.
|
||
movi a0, PS_RING_MASK | ||
rsr.ps a2 | ||
and a2, a2, a0 | ||
|
||
/* Need to set return to 1 if RING != 0, | ||
* so we won't be leaking which ring we are in | ||
* right now. | ||
*/ | ||
beqz a2, _is_user_context_return | ||
|
||
movi a2, 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could simplify this logic having this logic having this exception handled in User/Kernel vectors and setting UM bit properly. I have a prototype doing it but without the EPC1
trick. This is very clever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it... but then all the existing code will need to be audited to make sure they work on both non-UM/UM modes. So for now, doing this here to keep the PR a little bit simpler.
arch/xtensa/core/mpu.c
Outdated
* | ||
* This makes a hole in the MPU entry array so that a new entry can be inserted | ||
* into the one indexed at @a start_idx. The entries [@a start_idx, @a end_idx - 1] | ||
* will be moved to [@a start_idx + 1, @a end_idx]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that correct ? Isn't it going down copying end_idx
to end_idx - 1
... resulting in [start_idx, end_idx - 1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concept is kind of like stack where lower index is on top. So moving down means copying end_idx -1
to end_idx
(overwriting end_idx
). start_idx
will be a duplicate of start_idx + 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Will need to come back by to do a full review.
One big gotcha that bit me when dealing with the same hardware for mt8195 (which doesn't do full memory_domain support, but does need the MPU to enable caching):
The monotonicity requirement of MPU entries on real hardware is strict. If you have entries that "go backwards" for any reason at an address below that of an active memory access, the hardware will trap. This is true even if none of the overlapping region is relevant to the fetch at all. So the only way to do MPU reprogramming is to walk backwards from the end of the array. It strikes me that it's possible that qemu is less strict, which might hide this. Something to check for sure; took a long time to figure out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some quick notes. Definitely a little confused by the amount of work being spent on the background map feature, seems sorta needless? Is there hardware coming that has complicated background mappings we want to exploit for efficiency?
arch/xtensa/core/mpu.c
Outdated
* - Background map entries are copied to foreground map to | ||
* ensure correct alignment for hardware... which means it may | ||
* waste some foreground entries if there are no custom regions | ||
* between background entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW: the "background" feature in the hardware doesn't seem very useful to me. Existing hardware I can see core-isa.h files for never does anything more complicated than e.g. "all of memory is mapped uncached". I'd suggest dropping this code and just using the pre-existing SOC-provided memory map we already have. At most, we should just be validating the map provided by the background vs. what we know from dts/kconfig to be "The Truth" as a check on SOC config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, if you cross a background map "boundary", the address needs to be appear in the foreground map. In other words, any foreground regions cannot cross the boundaries of any background regions. And also that the entry[0], if enabled, must be the same as one of the background entries. So I just take the easy and simpler route here of copying the BG entries to FG as starting point. Having to account for BG entries when add new regions is going to complicate the code quite a bit... as the logic now is already overly complicated, I simply do not want to add to the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear: my suggestion is "ignore the background mapping" and just have a fully-populated mapping in the foreground (which takes no more entries). The requirement of not crossing background boundaries regions is AFAICT incorrect, and if it's not a requirement there's little to no value in trying to support the background stuff.
The only reason we'd even want it would be if you had a device with genuinely useful fine-grained background mappings that would allow us to save foreground entries for register blocks or whatever. But that seems not to be the case on the hardware I've seen where "background" just means "something that works OK at boot" (e.g. on mt8195 it's all uncached, which is basically useless).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you happen to have simulator or hardware setup to test this quickly? I can try it on QEMU and see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QEMU seems to be happy about it. So I removed the copying of background mappings.
* XCHAL_MPU_ALIGN_BITS provided by the toolchain indicates | ||
* that only bits of and left of this value are valid. This | ||
* corresponds to the minimum segment size (MINSEGMENTSIZE) | ||
* definied in the processor configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This also seems like needless complication? No hardware exists AFAIK with a MPU page size other than 4k, and IIRC there's language in the databook that implies strongly this is the only possible setup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am mostly getting the info from the ISA Reference Manual where segment size can be as small as 32 bytes.
Which databook is it? From what I have:
- LX6 one does not have a MPU chapter.
- The NX one mentioned that the configurable address granularity is minimally 4KB.
- On LX7, it depends if MPU Small Page option is enabled on hardware. If not, that it is 4KB. If enabled, then the segment size depends on fetch width, cache line size, etc.
arch/xtensa/core/mpu.c
Outdated
* | ||
* According to manual, newer processor configurations require that | ||
* the foreground entries must be aligned to the background map, | ||
* in addition to the foreground entry requirements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is? That's surprising to me, because I'm all but certain I had experimented successfully with things like "single MPU region covering all of memory". I can't find language like that in the ISA manual after a quick glance, but I do note that Figure 28 in section 5.5.3 clearly shows a foreground mapping labeled "jj" that spans the boundary of two background maps ("zz" and "yy"). Pretty sure you don't need this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming from the comments in the HAL. If the HAL implements it in such a way, I guess it is better for us to do it also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe @jcmvbkbc has input? My gut says exactly the opposite: we should implement it per hardware and spec, as the HAL is known to be sort of a mess.
* - Each MPU region is described by TWO entries: | ||
* [entry_a_address, entry_b_address). For contiguous memory regions, | ||
* this should not much of an issue. However, disjoint memory regions | ||
* "waste" another entry to describe the end of those regions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more: this waste is significant, and seems straightforward to fix? Just pre-compile your array of boundary entries into a "big enough" array on the stack, then pass over it slurping up duplicate boundaries that define zero-length segments. Cheap and easy, and only has to be done once per change to the domain. Obviously it needs some kind of handling for situations where we try to enable too many segments, but that's part of the userspace API already (and a K_OOPS() of the thread would be satisfactory handling anyway).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite understand what you mean here. For regions of 0x1000 - 0x3000
and 0x3000 - 0x4000
, it needs 3 entries. But if you have 0x1000 - 0x2000
and 0x3000 - 0x4000
, you will need 4 entries to describe them.
(Well... for other arch's MPUs where you program both start and end addresses in the same entry, both examples only need 2 entries to describe.)
Could you describe more in details on how to pre-compile the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your initial pass might emit an array like:
StartAddr Access
========= ======
0x1000 rw-
0x3000 ---
0x3000 r-x
0x4000 ---
Then you just go down the initial array, and just skip any entry where the following entry has the same address (i.e. it's a definition of a zero-length segment), producing:
StartAddr Access
========= ======
0x1000 rw-
0x3000 r-x
0x4000 ---
...which is the optimized form you want. The same process on the disjoint second example wouldn't change the array. The first array can be constructed cheaply on the stack with a maximally-sized (e.g. 64 entry for a 32-register MPU). Now we can have bigger and more complicated mappings as long as lots of them are adjacent (which is reasonably common, .text is next to .rodata, kernel stacks next to user stacks, etc...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can amend consolidate_entries
to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have amended consolidate_entries
so if two entries have a the same address, remove the lower indexed one.
QEMU also requires monotonically increasing addresses. So to erase entries, you have to go forward from start, while you need program the entries backwards from the end. |
6fae680
to
f7afaf0
Compare
@@ -20,6 +22,37 @@ | |||
.global xtensa_do_syscall | |||
.align 4 | |||
xtensa_do_syscall: | |||
#if XCHAL_HAVE_THREADPTR == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dumb question: but which hardware has an MPU but no THREADPTR? This seems a little academic? If it's just the qemu device, maybe we can reconfigure it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now it is QEMU only. Changing that will require a new configuration which will require Cadence to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More notes after about an 80% readthrough. None fatal, though I will say that the ad-hoc sorting stuff creeps me out (that kind of code is subtle and really hard to get right and maintain).
Also the definition of a Really Big C API to expose all the hardware details of an MPU entry seems like a lot of typing for no value, since all this code actually needs to do in practice is translate to/from the much simpler Zephyr userspace requirements.
#include <xtensa/config/core-isa.h> | ||
#include <xtensa/config/core.h> | ||
|
||
bool xtensa_is_user_context(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this be just a regular __syscall with an impl function that does the right thing? Not really seeing why there's value in hand-coding assembly here except a few cycles of performance, and this is AFAICT just a fallback for weird emulation environments. No one (famous last words) would instantiate hardware with an MPU and no THREADPTR, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is used frequent enough to warrant special treatment. If this is implemented as a normal syscall, it will need to go through all the level 1 exception code (e.g. cross stack, calling in C function, marshaling of args, etc.).
TBH, here it is to cover our basis just in case such configuration exists...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, but the counterargument is that a regular __syscall is portable and doesn't require special handling in the call0 stuff I'm finally banging on again. :)
Not a big deal, but I reserve the right to include a patch removing the entry code for this with a justification of "it's only for qemu anyway".
arch/xtensa/Kconfig
Outdated
|
||
if XTENSA_MPU | ||
|
||
menuconfig XTENSA_MPU_CUSTOM_MEMFLAGS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a general whine that this seems like way too much configuration. Most apps aren't going to be touching the defaults here. And where you want something other than default, it will be because of SOC-level memory map details that are better configured in DTS.
Basically: this is 100+ lines of kconfig definitions that almost no one is going to understand or know how to tune except for the original SOC integrators. And they'd be better served by devicetree or C APIs and not kconfig.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed these and replaced with just a hex kconfig for the default memory type value.
* current Zephyr image. This information must be available and | ||
* need to be processed upon MPU initialization. | ||
*/ | ||
static const struct xtensa_mpu_range mpu_zephyr_ranges[] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason we couldn't use a k_mem_partition here? I understand this isn't the userspace part of the PR, but there's no good reason we couldn't repurpose the structs, which are almost 1:1. Having a separate type here means that we'll actually have three distinct representations for every region (the MPU entry itself, the xtensa arch thing, and the userspace partition).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k_mem_partition
doesn't have enough flags/attributes to cover the possible attributes of MPU entries. One example is if someone really want a execute-only region which cannot be describe with k_mem_partition
.
arch/xtensa/core/mpu.c
Outdated
.end = (uintptr_t)__text_region_start, | ||
.access_rights = XTENSA_MPU_ACCESS_P_RX_U_RX, | ||
.memory_flags = XTENSA_MPU_MEMFLAGS_DEFAULT, | ||
.name = "vecbase", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a big waste in the grand scheme of things, but production apps don't need a "name" field for their memory regions and so shouldn't be responsible for carrying around the memory. Debugging features should generally be optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will amend it so it appears when debugging is enabled. It helps quite a bit when debugging the MPU init code though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed.
arch/xtensa/core/mpu.c
Outdated
* @param end_idx Index of end of region to be moved. | ||
*/ | ||
static void mpu_entries_shift_one_up(struct xtensa_mpu_entry *entries, | ||
uint8_t start_idx, uint8_t end_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Design nit: not loving the ad-hoc sorting logic here, which seems kinda inefficient and needless. Would IMHO be cleaner and smaller to just sort the array once using qsort(), or keep them in a rbtree, etc... Even doing a one-time O(N^2) extraction sort pass over the ~32 elements seems simpler than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me think about it a bit. Most of this code is really just to a find place to insert a new entry, and making a hole in the array to put the entry there. Doing in this way also makes the map valid (almost) all of the time, even between operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have replaced it with qsort instead.
arch/xtensa/core/mpu.c
Outdated
struct xtensa_mpu_entry *check_addr_in_mpu_entries(const struct xtensa_mpu_entry *entries, | ||
size_t num_entries, | ||
uintptr_t addr, bool *exact, | ||
uint8_t *entry_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another function I don't understand the need for? You can probe the active MPU config with a hardware instruction, we shouldn't need to be reading the table ourselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
??? What if we are manipulating a memory domain on another thread not currently running? We cannot probe MPU config for the MPU map on a thread that is not the current running thread.
arch/xtensa/core/mpu.c
Outdated
} | ||
|
||
if (remove) { | ||
mpu_entries_shift_one_down(entries, new_first, idx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clumsy O(N^2) way to do the compaction, IMHO. Just keep a "write pointer" and a "read pointer" into the array, which will diverge if you skip a write.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to keep the map valid (almost) all of the time. So if any operation fails, the map can still be programmed to hardware. This is probably my paranoia here. Any future modifications to the code can leave the map invalid (or rather unknowingly invalid) which will cause exception in hardware if programmed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought about it a bit... but doing it this way is less error prone as we will need to replace old entries at top of map if needed.
static int mpu_map_region_add(struct xtensa_mpu_map *map, | ||
uintptr_t start_addr, uintptr_t end_addr, | ||
uint32_t access_rights, uint32_t memory_type, | ||
uint8_t *first_idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This again is looking way too complicated to me. Don't keep the MPU config in sorted order at all, just keep an unordered[1] set of regions/partitions and sort/validate/optimize only once at the end of the process, before you turn it into the nice/small/tight/compact array of MPU entry registers.
[1] Or implicitly ordered in e.g. an rbtree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function still looks complicated, though I do see that it's no longer doing any sorting internally. But it's still doing a lot of work trying to keep the mpu entries in a correctly formatted form at all times, every time you want to modify even a single one of them. Seems like it would be simpler to just keep a list of regions to "compile" to a single array and do it all at once? That would require a failure path in the case where you try to compile an invalid map or whatever. But AFAICT userspace is tolerant of that.
arch/xtensa/core/mpu.c
Outdated
* variable to guard against this (... if done correctly). | ||
* Besides, this will almost be overridden by the SoC layer. | ||
* So tell GCC to ignore this. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this comment from somewhere else too. While this works, dealing with compiler pragmas seems needless. Just declare the symbol extern here and put the __weak definition in another C file where it doesn't get accessed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I turned the whole thing into "required" so no more weak symbol and this is no longer needed.
arch/xtensa/core/mpu.c
Outdated
extern char __data_end[]; | ||
|
||
__weak const struct xtensa_mpu_range xtensa_soc_mpu_ranges[0]; | ||
__weak int xtensa_soc_mpu_ranges_num; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really my fight to pick, but for sure the Official Way To Manage Memory Maps in Zephyr is now in Devicetree, with a whole schema and everything. This doesn't bug me personally but I guarantee someone (@carlocaione @stephanosio seem like good candidates) is going to hate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For now, it is simpler to do it this way without cluttering the code with devicetree macros. Using devicetree memory regions is a future enhancement.
I think only ARM MPU is using devicetree to describe memory regions. ARM MMU does not seem to do it. ARC MPU does not do it either. I think all MMU code are also not using devicetree region.
Calling z_mrsh_* functions require 7 arguments where the 7th is the stack frame. Only the first 6 arguments are passed by registers where the 7th must be done via stack. However, this is not being done and an incorrect argument was being passed to the z_mrsh_* functions as stack frame pointer. An obvious issue would be dumping of stack during kernel oops, as incorrect data was being printed or crashes due to inaccessible memory. So fix it by properly populating the stack with correct stack frame pointer as outgoing argument for the caller of z_mrsh_* functions. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
For CPU without THREADPTR, we need an alternative way to figure out if we are in user context. This extends the user context check to do that via a brief syscall. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Both CONFIG_XTENSA_SYSCALL_USE_HELPER and CONFIG_XTENSA_INSECURE_USERSPACE are also applicable to MPU. So move them out of the CPU_HAS_MMU block. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This enables support for MPU on Xtensa. Currently this is for kernel mode only. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
Add support to test for Xtensa MPU. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
With memory domain enabled, all threads within the same domain have access to each other threads' stacks, especially with CONFIG_ARCH_MEM_DOMAIN_SYNCHRONOUS_API enabled (as it is expected behavior). So update the conditions to skip both tests to read and write to other threads' stacks. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This extends the Xtensa MPU to support userspace. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This allows the SoC to have total control on what MPU ranges to be programmed at boot. This overrides the generic ranges in the architecture core code. Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This is exactly what I was freaked out about though, and why I thought we needed to do this discussion in a separate bug. But to do it here:
That just doesn't sound acceptable. Some architectures will be secure but others won't? I get this isn't related to this PR. But it's more important and for sure has to be fixed first, right? What's the root cause here that is messing with domain assignments for thread stacks? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I'll +1 if someone files a bug on the security issue separately (or explains to me why it's not a security issue).
Could you file the bug? Since you will be the one who can articulate on what is wrong in details? |
But I can't. I don't actually understand this, I honestly thought you did. :) I wrote that list above about how it's a security problem in the expectation that it was at least 60% likely that one of the steps would be a mistake on my part. But sure, I'll take a look and write something up. Basically the question here is "Why are thread stacks not isolated on architectures with synchronous mem_domain implemented?". |
My intention is more for the community, where you file an issue if you see something wrong. So that we can have a discussion there, and possibly decisions if they need to be made. So that it can be searchable, and be referenced in the future. |
FYI ! |
OK, so here's how I think this mess happened: It's not a framework security bug, it's a test weirdness and a historical wart of the ARM MPU implementation. The ztest_test_skip() in question arrived in commit eeab568 where it explains "MMU threads within the same memory domain have access to each other's stacks." Which is unambiguously true about mem_domains as documented: all the threads running in the domain get the same view of memory, ergo they can read each others stacks. QED. But I guess the early MPU implementations (realistically probably just ARM -- x86 MMU was the second userspace platform, wasn't it?) had some code to isolate stack memory by default, presumably dating from the days before the mem_domain API finalized. So these tests would pass (i.e. fail to access cross-thread stack memory) on the early MPU implementation. But that behavior was never promised or documented AFAICT. And of course now you've produced an MPU driver that honors the documented behavior and not "whatever ARM did", so it fails. Basically: I think this is a historical wart. These tests are exercising undocumented behavior from an early platform variant that isn't portable to modern Zephyr. If there's a bug, it's the converse: "legacy MPUs" disallow access to thread stacks that the documentation requires. I think the correct fix is to remove the tests entirely. They aren't testing anything but a failure condition. I'll submit a patch. My only remaining fear is the dependence on CONFIG_ARCH_MEM_DOMAIN_SYNCHRONOUS_API. As far as I can tell, that kconfig has absolutely nothing to do with thread stacks or mem_domain. I think (?) you added it as a proxy for "MPU that doesn't do the weird stuff ARM did", am I right? The actual synchronous API is a driver/platform-level thing. |
See #70461 for an IMHO cleaner fix that removes the test cases that are "testing" undocumented behavior. |
Yes, that kconfig is the closest I could find to differentiate the behavior. |
Yeah, let's not play games like that. My vote is to remove the tests entirely, obviously. But if you want to leave them in place and kludge around the skip logic, it's probably best to actually code to the specific platforms rather than pretend that an unrelated feature is important. There's also an alternative where we "bless" the current madness and invent a kconfig like CONFIG_MEM_DOMAIN_THREAD_PROTECTION that the old MPU's would select. But that sounds like a terrible idea to me personally: the worst place for API variance like that is in memory protection schemes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch to a +1 (and feel free to assume refresh). If this merges first with the existing patch still present I can always fix up #70461 later.
let's get this in and deal with tests and documentation as a followup. |
This enables support for MPU on Xtensa.
Note that a SoC config of sample_controller + MPU - s32c1i was used to develop the code, and tested on QEMU.