Skip to content

Conversation

@peter-mitsis
Copy link
Contributor

On some platforms, address 0 is actually valid in the kernel, but we do not want it to be valid in userspace--so it is insufficient to only use mem_kernel_has_access(). However, arch_buffer_validate() does not always recover well, so it too is insufficient on its own. Thus, we use both.

@peter-mitsis peter-mitsis requested a review from dcpleung May 15, 2025 18:14
@github-actions github-actions bot added the area: Xtensa Xtensa Architecture label May 15, 2025
@github-actions github-actions bot requested review from andyross, ceolin and nashif May 15, 2025 18:15
@andyross
Copy link
Contributor

Can you clarify how the predicates differ? I guess I'm unclear on the bug being fixed here. The function validates user-provided strings vs. a userspace memory domain that clearly shouldn't ever have null mapped, was that broken somehow?

Note that the Intel audio DSPs tend to have MMIO registers at the bottom of memory, so the hardware does allow access to the null page and maybe the defaults have it mapped (incorrectly) for userspace threads too?

@peter-mitsis
Copy link
Contributor Author

Here's my understanding as to what is going on ...

Address 0x0 can be valid in kernel space on some Xtensa platforms; however, on those platforms test_null_dynamic_name() fails when called from userspace. The syscall's verification function calls k_usermode_string_copy() which relies on arch_user_string_nlen() to detect that when NULL is passed to detect an error. However, since NULL (or 0x0) is valid in the kernel space where this is called, the old code, which only called xtensa_mem_kernel_has_access() will initially conclude that this is perfectly fine. However, it is not.

Initially, I tried replacing xtensa_mem_kernel_has_access() with arch_buffer_validate(), and had some success. However, running twister showed that resulted in numerous errors on qemu_xtensa/dc233c/mmu. After some discussions with @dcpleung, I learned that xtensa_mem_kernel_has_access() was initially used as it allowed for better recovery from access exceptions. Some further discussions and experimentation indicated that first doing the xtensa_mem_kernel_has_access() and then arch_buffer_validate() results in something that seems to work everywhere. First we check for kernel access and ensure and where there are exceptions we can recover from them. If all that worked, then we check for userspace access and should there be an exception, it should be recoverable.

All that so that we can properly detect a NULL device name from userspace.

@dcpleung
Copy link
Member

The issue here is that we only check if kernel has access, which it has by default since it is mapped to hardware registers. However, the test relies on the assumption that the first page is not mapped to catch NULL access exception. The kernel access check thus would return true. Adding the user access check is to workaround this issue where the first page is mapped in kernel mode.

@peter-mitsis
Copy link
Contributor Author

@andyross - ping

@peter-mitsis peter-mitsis force-pushed the pmitsis-arch-buffer-validate branch from c5b231b to 813bad9 Compare June 13, 2025 22:16
@peter-mitsis
Copy link
Contributor Author

Re-pushed to resolve sonarqubecloud code quality issue.

dcpleung
dcpleung previously approved these changes Jun 16, 2025
Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'm sort of getting it. But can we update the commit message to clarify what's happening here? This is a security fix, right? The code as written was checking string arguments to syscalls vs. the kernel MMU/MPU state and not the user mode setup, and that's obviously very wrong.

Also one note about needlessly calling the underlying mem_buffer_validate() twice.

*/
if (!xtensa_mem_kernel_has_access((void *)s, maxsize, 0)) {
if (!xtensa_mem_kernel_has_access(s, maxsize, 0) ||
arch_buffer_validate(s, maxsize, 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why bother checking kernel_has_access() when you're going to do a proper/slow validation vs. the MMU configuration anyway? What are the circumstances where arch_buffer_validate() returns zero but xtensa_mem_kernel_has_access() returns non-zero?

For MMU systems, in fact (see ptables.c) these are exactly the same function, just with a different wrapper for return value convention.

Adds 'const' to address pointer as its memory contents
do not change.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
When calling device_get_binding(NULL) from userspace, this eventually
funnels down to a call to arch_user_string_nlen() where it tried to
verify that the kernel has access to this address (0x0).  But since
this originates from userspace, we really want to know if this is
accessible from userspace, so using arch_buffer_validate() instead
of xtensa_mem_kernel_has_access() is preferable.

Signed-off-by: Peter Mitsis <peter.mitsis@intel.com>
@sonarqubecloud
Copy link

@peter-mitsis
Copy link
Contributor Author

I could have sworn that I had previously encountered regression errors on an xtensa platform when using only arch_buffer_validate() instead of that and xtensa_mem_kernel_has_access(). However, that behavior has disappeared and I can no longer duplicate it even rewinding to previous commit points. As a result, I am going with just the arch_buffer_validate()--it seems to be doing the job.

@kartben kartben merged commit 2f2eaf7 into zephyrproject-rtos:main Jun 18, 2025
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: Xtensa Xtensa Architecture

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants