Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynamic threads don't work on x86 in some configurations #17893

Closed
andrewboie opened this issue Jul 30, 2019 · 9 comments · Fixed by #30762
Closed

dynamic threads don't work on x86 in some configurations #17893

andrewboie opened this issue Jul 30, 2019 · 9 comments · Fixed by #30762
Labels
area: Memory Protection area: X86 x86 Architecture (32-bit) bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug

Comments

@andrewboie
Copy link
Contributor

Dynamic threads are an optional userspace feature; if CONFIG_DYNAMIC_OBJECTS is enabled, a thread may call k_object_alloc(K_OBJ_THREAD) which returns a usable struct k_thread pointer. This pointer is no more than 4-byte aligned; Zephyr does not have a heap which allows returning aligned pointers.

On x86, thread->arch may have members which require alignment. Currently this happens if CONFIG_EAGER_FP_SHARING or CONFIG_LAZY_FP_SHARING is turned on, as thread->arch will then contain structs requiring alignment because they get passed to x86 CPU instructions to save/restore fp context. Threads that are instantiated statically will be aligned properly, but likely not if allocated dynamically, resulting in a CPU exception when context is attempted to be saved/restored.

A quick survey of arches reveals that only x86 threads have this problem.

We may need to find an alternate storage area. Floating point register stuff could possibly be stored in thread-local area, next to where we put errno; its contents are only significant when the thread is swapped out, and hence a user thread being able to touch this is not of concern.

@andrewboie andrewboie added bug The issue is a bug, or the PR is fixing a bug area: X86 x86 Architecture (32-bit) area: Memory Protection priority: low Low impact/importance bug labels Jul 30, 2019
@jenmwms jenmwms added this to the v2.1.0 milestone Aug 20, 2019
@jenmwms jenmwms assigned andrewboie and unassigned wentongwu Aug 20, 2019
@andrewboie andrewboie modified the milestones: v2.1.0, v2.2.0 Oct 24, 2019
@andrewboie
Copy link
Contributor Author

This is also a problem on x86_64, and happens quite frequently now that I have userspace working on that, blocking the test case..
I'm disabling tests/kernel/threads/dynamic_thread on x86 until this is fixed.

andrewboie pushed a commit to andrewboie/zephyr that referenced this issue Jan 13, 2020
This doesn't work properly on x86 unless the dynamic thread
struct allocated gets lucky and is aligned to 16 bytes.
Disabling for now until zephyrproject-rtos#17893 is fixed.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
nashif pushed a commit that referenced this issue Jan 13, 2020
This doesn't work properly on x86 unless the dynamic thread
struct allocated gets lucky and is aligned to 16 bytes.
Disabling for now until #17893 is fixed.

Signed-off-by: Andrew Boie <andrew.p.boie@intel.com>
@jhedberg jhedberg modified the milestones: v2.2.0, v2.3.0 Mar 10, 2020
@mp-jakob
Copy link

Is this an issue needing to be addressed in Zephyr? The gcc documentation does not specify whether alignment applies to stack variables. However, from reading this issue it seems to me that stack variable alignment smaller than given in the attribute was deemed a bug which was supposedly fixed in gcc 4.6.

@andrewboie
Copy link
Contributor Author

Is this an issue needing to be addressed in Zephyr? The gcc documentation does not specify whether alignment applies to stack variables. However, from reading this issue it seems to me that stack variable alignment smaller than given in the attribute was deemed a bug which was supposedly fixed in gcc 4.6.

@wobak-j this doesn't have anything to do with the compiler, it's the alignment requirements of memory addresses passed to fxsave/fxrstor.

https://www.felixcloutier.com/x86/fxsave

The destination operand contains the first byte of the memory image, and it must be aligned on a 16-byte boundary. A misaligned destination operand will result in a general-protection (#GP) exception being generated (or in some cases, an alignment check exception [#AC]).

@andrewboie andrewboie removed this from the v2.3.0 milestone Apr 30, 2020
@mp-jakob
Copy link

mp-jakob commented Apr 30, 2020

Yes, I understand that the exception is triggered by an fxsave to a misaligned address.
My point was that that the tFpRegSetEx struct, to whose address the fxsave operation is performed, already has the correct alignment attribute. According to the the gcc documentation, this alignment should be propagated transitively through the parent structs. From my understanding of the linked gcc bug report, the compiler should therefore align variables of the k_thread struct type automatically to the alignment of the contained tFpRegSetEx, even if they are located on the stack.

@andrewboie
Copy link
Contributor Author

andrewboie commented Apr 30, 2020

Yes, I understand that the exception is triggered by an fxsave to a misaligned address.
My point was that that the tFpRegSetEx struct, to whose address the fxsave operation is performed, already has the correct alignment attribute. According to the the gcc documentation, this alignment should be propagated transitively through the parent structs. From my understanding of the linked gcc bug report, the compiler should therefore align variables of the k_thread struct type automatically to the alignment of the contained tFpRegSetEx, even if they are located on the stack.

@wobak-j for build-time defined k_thread, yes, the alignment is correct. But that's not what this bug is about.

If you call malloc() or get memory from a memory pool, and cast the returned void * pointer to a k_thread, the alignment is likely not correct. We have no heap in Zephyr where you can specify the alignment in the allocation function.

This has nothing to do with GCC or the alignment of stack variables because we are not getting the memory for the k_thread from the stack (and it would be insane to do so, the kernel would explode as soon as the stack memory falls out of scope).

Please review the original issue statement:

Dynamic threads are an optional userspace feature; if CONFIG_DYNAMIC_OBJECTS is enabled, a thread may call k_object_alloc(K_OBJ_THREAD) which returns a usable struct k_thread pointer. This pointer is no more than 4-byte aligned; Zephyr does not have a heap which allows returning aligned pointers.

@nashif
Copy link
Member

nashif commented May 1, 2020

btw, the test currently passes on qemu_x86 and still fails on qemu_x86_64, was looking for a different reason at this test.

DEBUG   - QEMU: )
DEBUG   - QEMU: Booting from ROM..Optimal CONFIG_X86_MMU_PAGE_POOL_PAGES 11
DEBUG   - QEMU: *** Booting Zephyr OS build zephyr-v2.2.0-2181-ga4ad2e0d9fd6  ***
DEBUG   - QEMU: Running test suite thread_dynamic
DEBUG   - QEMU: ===================================================================
DEBUG   - QEMU: starting test - test_kernel_create_dyn_user_thread
DEBUG   - QEMU: PASS - test_kernel_create_dyn_user_thread
DEBUG   - QEMU: ===================================================================
DEBUG   - QEMU: starting test - test_user_create_dyn_user_thread
DEBUG   - QEMU: PASS - test_user_create_dyn_user_thread
DEBUG   - QEMU: ===================================================================
DEBUG   - QEMU: starting test - test_dyn_thread_perms
DEBUG   - QEMU: E: thread 0x001214a0 (5) does not have permission on k_sem 0x001490b0
DEBUG   - QEMU: E: permission bitmap
DEBUG   - QEMU: E: 03 00                   |..
DEBUG   - QEMU: E: syscall z_vrfy_k_sem_give failed check: access denied
DEBUG   - QEMU: E: EAX: 0x00000000, EBX: 0x00000000, ECX: 0x00000000, EDX: 0x00000000
DEBUG   - QEMU: E: ESI: 0x00000000, EDI: 0x00000000, EBP: 0x00000000, ESP: 0x001290c8
DEBUG   - QEMU: E: EFLAGS: 0x00000246 CS: 0x002b CR3: 0x00121060
DEBUG   - QEMU: E: call trace:
DEBUG   - QEMU: E: EIP: 0x0010021f
DEBUG   - QEMU: E: NULL base ptr
DEBUG   - QEMU: E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
DEBUG   - QEMU: E: Current thread: 0x001214a0 (unknown)
DEBUG   - QEMU: ===== must have access denied on k_sem 0x001490b0
DEBUG   - QEMU: PASS - test_dyn_thread_perms
DEBUG   - QEMU: ===================================================================
DEBUG   - QEMU: starting test - test_thread_index_management
DEBUG   - QEMU: created 10 thread objects
DEBUG   - QEMU: PASS - test_thread_index_management
DEBUG   - QEMU: ===================================================================
DEBUG   - QEMU: Test suite thread_dynamic succeeded
DEBUG   - QEMU: ===================================================================
DEBUG   - QEMU: PROJECT EXECUTION SUCCESSFUL
DEBUG   - QEMU complete (passed) after 3.098289 seconds

@andrewboie
Copy link
Contributor Author

@nashif are you sure the quoted failure is from qemu_x86_64? note the register names, that is from 32-bit

About the original issue: storing in the stack object doesn't work well. It would need to be in memory not accessible by a user thread, and not possible to overwrite in a supervisor stack overflow. Which means it would need to be located before the page-aligned stack buffer, and it would consume a page all by itself, wasting 3584 bytes per thread which is a dealbreaker.

I think we need to support aligned thread objects in our dynamic allocations in order to fix this specific issue. Without an aligned allocator, even thread objects that don't require alignment are of limited value since since their corresponding stacks (which must be aligned) cannot be allocated.

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

@github-actions
Copy link

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Memory Protection area: X86 x86 Architecture (32-bit) bug The issue is a bug, or the PR is fixing a bug priority: low Low impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants