Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

arm64: Add stack guard for v8r #62902

Merged

Conversation

povergoing
Copy link
Member

Add stack guard as an implementation of HW_STACK_PROTECTION. The original work is #55207, but the PR cannot be reopened, it might be too long, so I have to reopen a new one.

  • Enable MPU-based stack guard
  • Tested on fvp_baser_aemv8r and fvp_baser_aemv8r_smp board with HW_STACK_PROTECTION enabled.
  • Fix some testcases
    • it worth noting that the stack guard detects a stack overflow in samples/arch/smp/pi/

Example:

$ west build -p -b fvp_baser_aemv8r -t run zephyr/tests/kernel/fatal/exception/

or

$ west build -p -b fvp_baser_aemv8r_smp -t run zephyr/tests/kernel/fatal/exception/

The output will be something like:

...
test stack HW-based overflow - user 1
E: STACK OVERFLOW FROM USERSPACE, SP: 0x1fe20 OR FAR: 0x1fdf0 INVALID, SP LIMIT: 0x1fe00
E: ELR_ELn: 0x0000000000001234
E: ESR_ELn: 0x000000009200004c
E:   EC:  0x24 (Data Abort from a lower Exception level, that might be using AArch32 or AArch64)
E:   IL:  0x1
E:   ISS: 0x4c
E: FAR_ELn: 0x000000000001fdf0
E: TPIDRRO: 0x010000000001ba78
E: x0:  0x0000000000000000  x1:  0x0000000000000002
E: x2:  0x0000000000000000  x3:  0x0000000000000000
E: x4:  0x0000000000020de0  x5:  0x0000000000001e2c
E: x6:  0x000000000001fe00  x7:  0x0000000000000040
E: x8:  0x0000000000000089  x9:  0x000000000001a360
E: x10: 0x000000000001ee00  x11: 0x000000000000000f
E: x12: 0x0000000000000000  x13: 0x000000000001fe00
E: x14: 0x0000000000000008  x15: 0x000000000001bbd0
E: x16: 0x000000000001cbb0  x17: 0x000000000001a7f0
E: x18: 0x000000000001bc30  lr:  0x0000000000001250
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Current thread: 0x1a200 (unknown)
Caught system error -- reason 2

...

test stack HW-based overflow - user priv stack 1
E: STACK OVERFLOW FROM KERNEL, SP: 0x1ed60 OR FAR: 0x1edf0 INVALID, SP LIMIT: 0x1ee00
E: ELR_ELn: 0x0000000000001234
E: ESR_ELn: 0x0000000096000044
E:   EC:  0x25 (Data Abort taken without a change in Exception level)
E:   IL:  0x1
E:   ISS: 0x44
E: FAR_ELn: 0x000000000001edf0
E: TPIDRRO: 0x020000000001ba78
E: x0:  0x0000000000000000  x1:  0x0000000000000002
E: x2:  0x0000000000000000  x3:  0x0000000000000000
E: x4:  0x0000000000020de0  x5:  0x0000000000001e2c
E: x6:  0x000000000001fd40  x7:  0x0000000000000040
E: x8:  0x0000000000000000  x9:  0x00000000000017ac
E: x10: 0x000000000001ee00  x11: 0x000000000000000f
E: x12: 0x0000000000000000  x13: 0x000000000001fe00
E: x14: 0x0000000000000008  x15: 0x000000000001bbd0
E: x16: 0x0000000000020da0  x17: 0x0000000000003480
E: x18: 0x000000000001bc30  lr:  0x0000000000001250
E: >>> ZEPHYR FATAL ERROR 2: Stack overflow on CPU 0
E: Current thread: 0x1a200 (unknown)
Caught system error -- reason 2
...

Limitation:

  • Still trying to find a better way to handle stack overflow detection during the printk or LOG
  • MMU based stack guard does NOT support, and might open another PR in the future.

Accessing mem before mmu or mpu init will cause a cache coherence issue.
To avoid such a problem, move the safe exception stack init function
after the mmu or mpu is initiated.

Also change the data section attribute from INNER_SHAREABLE to
OUTER_SHAREABLE. Otherwise there will be a cache coherence issue during
the memory regions switch. Because we are using background region to do
the regions switch, and the default background region is
OUTER_SHAREABLE, if we use INNER_SHAREABLE as the foreground region,
then we have to flush all cache regions to make sure the cached values
are right. However, flushing all regions is too heavy, so we set
OUTER_SHAREABLE to fix this issue.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
Clean the thread->arch during the arch_new_thread to avoid unexpected
behavior. If the thread struct is allocated from heap or in stack, the
data in thread->arch might be dirty.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
Introduce the ARM64_STACK_PROTECTION config. This option leverages the
MMU or MPU to cause a system fatal error if the bounds of the current
process stack are overflowed. This is done by preceding all stack areas
with a fixed guard region. The config depends on MPU for now since MMU
stack protection is not ready.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
Add the stack check function z_arm64_stack_corruption_check at
z_arm64_fatal_error to handle the stack overflow triggered by the
hardware region.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
Refactor the stack relevant macros to prepare to introduce the stack
guard. Also add comments about the changes related to stack layout.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
To make the stack guard works well, clean and refine the MPU code. To
save the MPU regions (the number of MPU regions are limited), we choose
to remove the guard region. Comparing to add an individual region to
guard the stack, removing the guard region can at least save 2 regions
per core.

Similarly with userspace, the stack guard will leverage the dynamic
regions switching mechanism which means we need a region switch during
the context switch. Otherwise, the other option is using stack guard
region, but this is very limited since the number of MPU regions is
limited.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
Enable stack guard for v8R which is backed by MPU.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
The test case allocate struct k_thread thread in the stack. This will
lead the random initial value of thread and thus cause the test cases
randomly hang. To fix such issue, move the declartion of struct k_thread
thread outside the function as a stacic variable.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
The heap size is not enough so that it will cause the testcase fail.
Increase to 32k to make sure it works for a long time in the future.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
The stack guard report this testcase has the stack overflow issue. To
fix the issue, slightly increse the stack size.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
The test_sem_take_timeout_isr depends on the thread's priority. But for
SMP platforms, the priority is different with no-SMP. High-priority
threads and low-priority threads might run simultaneously at different
cores. Set the test case run at 1cpu to fix such an issue.

Signed-off-by: Jaxson Han <jaxson.han@arm.com>
Copy link
Collaborator

@carlocaione carlocaione left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks an amazing work to me.

@carlescufi carlescufi merged commit 2a8548b into zephyrproject-rtos:main Sep 22, 2023
35 checks passed
@povergoing povergoing deleted the v8r64_add_mpu_stack_guard branch November 14, 2023 02:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ARM64 ARM (64-bit) Architecture area: Kernel area: POSIX POSIX API Library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants