kernel: support memory mapped stacks #70810

dcpleung · 2024-03-27T23:01:39Z

This adds support to memory map thread stacks during thread creation, and subsequent (delayed) un-mapping when the thread is terminated. With memory mapped stacks, guard pages can be created enclosing the stacks to catch stack overflow and underflow. This requires MMU.

(See individual commits for details.)

Relates: #28899

Pylint complains so we fix. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

x86_64 does not currently support demand paging so don't advertise it. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

Most places use CONFIG_X86_STACK_PROTECTION, but there are some places using CONFIG_HW_STACK_PROTECTION. So synchronize all to use CONFIG_X86_STACK_PROTECTION instead. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

Previous commit changed the privileged stack size to be using kconfig CONFIG_PRIVILEGED_STACK_SIZE instead of simply CONFIG_MMU_PAGE_SIZE. However, the stack bound check function was still using the MMU page size, so fix that. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

This adds the mechanism to do cleanup after k_thread_abort() is called with the current thread. This is mainly used for cleaning up things when the thread cannot be running, e.g., cleanup the thread stack. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

This is similar to k_mem_map()/_unmap(). But instead of using anonymous memory, the provided physical region is mapped into virtual address instead. In addition to simple mapping physical ro virtual addresses, the mapping also adds two guard pages before and after the virtual region to catch buffer under-/over-flow. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

This introduces support for memory mapped thread stacks, where each thread stack is mapped into virtual memory address space with two guard pages to catch under-/over-flowing the stack. This is just on the kernel side. Additional architecture code is required to fully support this feature. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

This extends the test for memory mapped stack, as the address of memory mapped stack object would be different than the actual stack object. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

This extends the thread abort hook to support memory mapped stack. The calculation to find out to which instance of thread pools the outgoing thread belongs requires physical address. So find the physical address via the memory mapped stack for that. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

This extends the mem_protect/stackprot tests to support testing with memory mapped stack. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

The current memory mapped stack code requires actual physical addresses for stacks, and cannot deal with stacks already using virtual addresses. So disable mapped stack via defconfig. Note that this is done before enabling memory mapped stacks on x86 so test won't fail when the support in the architecture code is introduced. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

This adds the necessary bits to enable memory mapping thread stacks on both x86 and x86_64. Note that currently these do not support multi level mappings (e.g. demand paging and running in virtual address space: qemu_x86/atom/virt board) as the mapped stacks require actual physical addresses. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

cfriedt

Well done! It's not every day a 2xxxx series issue gets some attention 🪁

ceolin · 2024-04-02T07:11:40Z

kernel/thread.c

+	 */
+	thread->stack_info.mapped.addr = NULL;
+	thread->stack_info.mapped.sz = 0;
+#endif /* CONFIG_THREAD_STACK_MEM_MAPPED */


How this works with smp ? What happens if two threads running in parallel do z_thread_abort(self) ?

The spinlock in k_thread_abort_cleanup() will serialize them. Thread 1 will setup the data for deferred unmapping. Thread 2 will see that and will do the unmapping for thread 1, then setup its own deferred unmapping data.

aaah ok, there is a cleanup first for any pending thread.

ceolin · 2024-04-02T07:13:03Z

yay ! overflow detection !

andyross · 2024-04-03T23:50:32Z

I'm gonna be that guy again. This is a huge patch and a ton of complexity... do we really want to do this this way? The feature has value, but is this the right API?

I mean, look at Linux. Linux has all this stuff too. And it doesn't have any support in the kernel to do it. You create a new thread stack with mmap() and just point clone() to it, and the kernel uses it right where you told it to.

Can't we do this with a library layer where you say "give me a mmap'd thread stack", and you get one, and then pass that to k_thread_create()? Why bother making a complicated thread creation API even more complicated?

cfriedt · 2024-04-03T23:58:21Z

This could also be used for some other interesting things aside from thread stacks 😏

andyross · 2024-04-04T00:00:13Z

kernel/include/kernel_internal.h

+ *
+ * @param thread Pointer to thread to be cleaned up.
+ */
+void k_thread_abort_cleanup(struct k_thread *thread);


Good example here: if your stack creation is a library level thing, you can just provide a top level wrapper or whatever and clean it up on exit. No need to muck with thread lifecycle stuff in the kernel

andyross

This could also be used for some other interesting things aside from thread stacks 😏

In fact pthreads would be the obvious choice for the "wrapper" API that does the appropriate stack allocation/mapping magic on behalf of the app, because that's where users expect that abstraction to live as that's where they find it on Linux.

andyross · 2024-04-04T01:06:29Z

kernel/thread.c

+	 * cleared below.
+	 */
+	void *stack_mapped = k_mem_phys_map((uintptr_t)stack, stack_obj_size,
+					    K_MEM_PERM_RW | K_MEM_CACHE_WB | K_MEM_MAP_UNINIT);


Also just to be pedantic, this isn't an optional thing in this scheme: every stack in every thread gets remapped automatically whether it wants it or not. But control over stack memory is something a lot of apps want. Consider a runtime for a Go-like language with lots of I/O threads flying around using tiny bounded stacks, etc...

Obviously most apps will just want defaults, but at the same time most apps would be just as happy with a recommendation like "use pthreads to get automatic MMU stack bounds, k_thread_create() is the low level API".

dcpleung · 2024-04-04T02:39:20Z

This is done in this way to be as transparent to the API user as possible. As long as the kconfig is enabled, the feature is enabled without any code changes on the application.

With your idea of having a library call to produce memory mapped stacks (whether coming from malloc or statically declared stacks), the user of k_thread_create() has to be explicitly call this library beforehand which requires changes to any code trying to use this feature. Even a thread option to enable this feature on a per thread basis would result in similar code to the current form since it still needs to do things behind the scene.

If you can come up with another solution without requiring application code changes, please feel free to submit it. I am very interested in seeing such solution.

andyross · 2024-04-04T03:40:44Z

But... why do we not want application code changes, though? New features get new APIs, that's not so weird to me? And I repeat, it's not as simple as "just turn it on", some apps won't want this, or will want it for only certain threads. k_thread_create() is an extremely complicated API already.

The philosophy here is basically that toolkits are better than frameworks. There's no Right Answer for how to lay out memory spaces, it's just a complicated area. Every runtime on Linux tends to have its own ideas about how to do this, for example (Python threads don't get initialized like bare pthreads, which are different from Rust, etc...) and it won't be any different for us. If we land this API like it is we're stuck with it forever, and I really don't think this is likely to evolve cleanly.

cfriedt · 2024-04-04T10:41:54Z

Just a suggestion, couldn't mapping be a flag that is added to k_thread_stack_alloc()?

I assume it should be possible to detect if a stack is mapped in k_thread_stack_free()

dcpleung · 2024-04-04T16:05:38Z

If an app doesn't want it, they don't have to turn on the kconfig. I can add a thread creation flag to use stack as-is. The fundamental question is, whether we want to be safe by default, or only opt into safety if needed (or rather, if someone remembers to be safe). I would argue that in most cases, safe by default should be the choice. And that being unsafe needs to be a conscious choice (or rather, MUST be a conscious choice).

Here, having guards around stack serves is important to prevent corrupting data when running in kernel mode, whether with kernel mode only threads or serving syscalls with privileged stack. Overflowing thread stack in user mode will touch the privileged stack in current arch MMU code and it will generate page faults. So that is covered. However, with kernel mode only threads and serving syscalls, they are running in kernel mode. And given that most stacks are declared statically in the .noinit section, any overflowing of stack will corrupt other data. This is because under kernel mode, everything is accessible.

dcpleung · 2024-04-04T16:12:09Z

Just a suggestion, couldn't mapping be a flag that is added to k_thread_stack_alloc()?

I assume it should be possible to detect if a stack is mapped in k_thread_stack_free()

This only works if stacks are allocated at runtime. For statically declared stacks (which majority of them are), they would still need to be pre-processed first if mapping is not handled automatically.

dcpleung force-pushed the kernel/mapped_stacks branch 5 times, most recently from 9681584 to 49be6cc Compare March 29, 2024 17:07

dcpleung added 12 commits April 1, 2024 11:34

x86: gen_gdt: remove extra unnecessary parens

8f47e42

Pylint complains so we fix. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

x86: do not advertise demand paging support for x86_64

58fe23a

x86_64 does not currently support demand paging so don't advertise it. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

x86: synchronize usage of CONFIG_X86_STACK_PROTECTION

eb713f4

Most places use CONFIG_X86_STACK_PROTECTION, but there are some places using CONFIG_HW_STACK_PROTECTION. So synchronize all to use CONFIG_X86_STACK_PROTECTION instead. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

tests: mem_protect/userspace: extend tests for mapped stack

9fd9271

This extends the test for memory mapped stack, as the address of memory mapped stack object would be different than the actual stack object. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

tests: threads/thread_stack: extend tests for mapped stack

20b30bb

This extends the test for memory mapped stack, as the address of memory mapped stack object would be different than the actual stack object. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

tests: mem_protect/stackprot: test mapped stack

d60e22d

This extends the mem_protect/stackprot tests to support testing with memory mapped stack. Signed-off-by: Daniel Leung <daniel.leung@intel.com>

dcpleung force-pushed the kernel/mapped_stacks branch from 49be6cc to 0b1c3c9 Compare April 1, 2024 20:16

dcpleung marked this pull request as ready for review April 1, 2024 21:56

zephyrbot added area: Userspace Userspace area: CMSIS API Layer CMSIS-RTOS API Layer area: Architectures area: Portability Standard compliant code, toolchain abstraction area: Kernel area: X86 x86 Architecture (32-bit) labels Apr 1, 2024

zephyrbot requested review from andyross, ceolin, cfriedt and jhedberg April 1, 2024 21:56

zephyrbot requested review from laurenmurphyx64, nashif, npitre, peter-mitsis, TaiJuWu and tbursztyka April 1, 2024 21:56

zephyrbot assigned jhedberg Apr 1, 2024

nashif assigned ceolin and unassigned jhedberg Apr 1, 2024

cfriedt approved these changes Apr 1, 2024

View reviewed changes

ceolin reviewed Apr 2, 2024

View reviewed changes

jhedberg approved these changes Apr 2, 2024

View reviewed changes

ceolin approved these changes Apr 2, 2024

View reviewed changes

andyross reviewed Apr 4, 2024

View reviewed changes

nashif merged commit 027a1c3 into zephyrproject-rtos:main Apr 10, 2024
45 checks passed

dcpleung deleted the kernel/mapped_stacks branch April 10, 2024 15:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kernel: support memory mapped stacks #70810

kernel: support memory mapped stacks #70810

dcpleung commented Mar 27, 2024 •

edited

cfriedt left a comment

ceolin Apr 2, 2024

dcpleung Apr 2, 2024

ceolin Apr 2, 2024

ceolin commented Apr 2, 2024

andyross commented Apr 3, 2024

cfriedt commented Apr 3, 2024

andyross Apr 4, 2024

andyross left a comment

andyross Apr 4, 2024

dcpleung commented Apr 4, 2024

andyross commented Apr 4, 2024

cfriedt commented Apr 4, 2024

dcpleung commented Apr 4, 2024

dcpleung commented Apr 4, 2024

kernel: support memory mapped stacks #70810

kernel: support memory mapped stacks #70810

Conversation

dcpleung commented Mar 27, 2024 • edited

cfriedt left a comment

Choose a reason for hiding this comment

ceolin Apr 2, 2024

Choose a reason for hiding this comment

dcpleung Apr 2, 2024

Choose a reason for hiding this comment

ceolin Apr 2, 2024

Choose a reason for hiding this comment

ceolin commented Apr 2, 2024

andyross commented Apr 3, 2024

cfriedt commented Apr 3, 2024

andyross Apr 4, 2024

Choose a reason for hiding this comment

andyross left a comment

Choose a reason for hiding this comment

andyross Apr 4, 2024

Choose a reason for hiding this comment

dcpleung commented Apr 4, 2024

andyross commented Apr 4, 2024

cfriedt commented Apr 4, 2024

dcpleung commented Apr 4, 2024

dcpleung commented Apr 4, 2024

dcpleung commented Mar 27, 2024 •

edited