Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental build with config changes can produce an invalid binary when userspace is enabled #42184

Closed
carlocaione opened this issue Jan 26, 2022 · 1 comment · Fixed by #42451
Assignees
Labels
area: Build System area: Kernel area: newlib Newlib C Standard Library area: Userspace Userspace bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug

Comments

@carlocaione
Copy link
Collaborator

This is reproducible on qemu_x86_64 and qemu_cortex_a53. Derived from #42106.

Compile samples/userspace/hello_world_user + CONFIG_NEWLIB_LIBC.

On qemu_x86_64:

Booting from ROM..*** Booting Zephyr OS build v3.0.0-rc1-11-g2f5976d674ff  ***
ASSERTION FAIL [(options & ((1UL << (2)))) == 0U || z_stack_is_user_capable(stack)] @ WEST_TOPDIR/zephyr/kernel/thread.c:514
        user thread 0x120000 with kernel-only stack 0x144000
E: RAX: 0x0000000000000004 RBX: 0x0000000000000800 RCX: 0x0000000000000001 RDX: 0x0000000000000001
E: RSI: 0x0000000000000202 RDI: 0x000000000011c555 RBP: 0x0000000000146900 RSP: 0x00000000001468b8
E:  R8: 0x0000000000000001  R9: 0x0000000000000000 R10: 0x00000000ffffffff R11: 0x000000000011c696
E: R12: 0x0000000000144000 R13: 0x000000000010a2e8 R14: 0x00000000ffffffff R15: 0x0000000000120000
E: RSP: 0x00000000001468b8 RFLAGS: 0x0000000000000246 CS: 0x0018 CR3: 0x000000000014f000
E: RIP: 0x000000000010c097
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x121900 (unknown)
E: Halting system

On qemu_cortex_a53:

*** Booting Zephyr OS build v3.0.0-rc1-11-g2f5976d674ff  ***
ASSERTION FAIL [(options & ((1UL << (2)))) == 0U || z_stack_is_user_capable(stack)] @ WEST_TOPDIR/zephyr/kernel/thread.c:514
        user thread 0x40031000 with kernel-only stack 0x4007c000
E: ELR_ELn: 0x000000004000cfd4
E: ESR_ELn: 0x0000000056000002
E:   EC:  0x15 (Unknown)
E:   IL:  0x1
E:   ISS: 0x2
E: TPIDRRO: 0x0100000040031a78
E: x0:  0x0000000040031a78  x1:  0x0000000000000202
E: x2:  0x0000000000000090  x3:  0x000000004007e719
E: x4:  0x0000000000000010  x5:  0x0000000000000004
E: x6:  0x0000000000000031  x7:  0x000000004001dba0
E: x8:  0x0000000000000004  x9:  0x00000000ffffffff
E: x10: 0x0000000000000001  x11: 0x0000000000000000
E: x12: 0x0000000000000000  x13: 0x0000000000000000
E: x14: 0x0000000000000000  x15: 0x0000000000000000
E: x16: 0x00000000400104d4  x17: 0x0000000000000000
E: x18: 0x0000000000000000  x30: 0x000000004001213c
E: >>> ZEPHYR FATAL ERROR 4: Kernel panic on CPU 0
E: Current thread: 0x400316c0 (unknown)
E: Halting system

In both cases z_stack_is_user_capable(stack) is returning NULL.

In particular what's returning NULL is z_object_lookup():

static inline struct z_object *
z_object_lookup ( const char *str,  size_t len)
{
  if (len <= MAX_WORD_LENGTH && len >= MIN_WORD_LENGTH)
    {
       unsigned int key = hash((const char *)&str, len);

      if (key <= MAX_HASH_VALUE)
        if (len == sizeof(void *))
          {
             const char *s = wordlist[key].name;

            if (str == s)
              return &wordlist[key];
          }
    }
  return 0;
}

The (str == s) comparison is failing because while the str is correctly carrying the address of the stack, the value of s retrieved from the wordlist table is wrong.

In particular in s we found the address of the stack as it would have been without NEWLIBC enabled.

My theory is that the wordlist is created at the wrong time during linking (maybe too soon?) so address used in there is not definitive and wrong with respect to the addresses in the final image.

@carlocaione carlocaione added the bug The issue is a bug, or the PR is fixing a bug label Jan 26, 2022
@carlocaione
Copy link
Collaborator Author

tagging @stephanosio

@stephanosio
Copy link
Member

My theory is that the wordlist is created at the wrong time during linking (maybe too soon?) so address used in there is not definitive and wrong with respect to the addresses in the final image.

@carlocaione Something is definitely wrong there. That issue is observed only when you do an incremental (non-clean) build after enabling the newlib, meaning this is a build system issue (cc @tejlmand @SebastianBoe).

If you do a clean build after enabling the newlib, you get the following error message, which is expected because the newlib with userspace requires the kernel dynamic allocation feature (see #40686 (comment)).

[0/1] To exit from QEMU enter: 'CTRL+a, x'[QEMU] CPU: cortex-a53
*** Booting Zephyr OS build v3.0.0-rc1-57-g98f66d756467  ***
ASSERTION FAIL [*lock != ((void *)0)] @ _WEST_TOPDIR/zephyr/lib/libc/newlib/libc-hooks.c:383_
        recursive lock allocation failed
E: ELR_ELn: 0x000000004000cfcc
E: ESR_ELn: 0x0000000056000002
E:   EC:  0x15 (Unknown)
E:   IL:  0x1
E:   ISS: 0x2
E: TPIDRRO: 0x0100000040031a78
E: x0:  0x0000000040031a79  x1:  0x000000000000017f
E: x2:  0x0000000040031a79  x3:  0x000000004007da90
E: x4:  0x000000004007dbf0  x5:  0x000000004007dbf0
E: x6:  0x0000000000000058  x7:  0x000000004001d5fb
E: x8:  0x0000000000000003  x9:  0x00000000ffffffff
E: x10: 0x000000004001c3d2  x11: 0x00000000ffffffc8
E: x12: 0x000000004007dfe0  x13: 0x000000004007dfe0
E: x14: 0x0000000000000000  x15: 0x0000000000000000
E: x16: 0x0000000000000000  x17: 0x0000000000000000
E: x18: 0x0000000000000000  x30: 0x000000004000ff10
E: >>> ZEPHYR FATAL ERROR 3: Kernel oops on CPU 0
E: Current thread: 0x40031000 (unknown)
E: Halting system

Also note that the newlib + userspace combo is currently tested by the tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace test, and this test passes for all architectures; so, this feature combo is definitely not broken; rather there is a build system issue that is creating an invalid binary when doing an incremental build.

$ scripts/twister -v -N -M -S -s tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace
ZEPHYR_BASE unset, using "/home/stephanos/Dev/zephyrproject/zephyr"
Renaming output directory to /home/stephanos/Dev/zephyrproject/zephyr/twister-out.5
INFO    - Zephyr version: v3.0.0-rc1-57-g98f66d756467
INFO    - JOBS: 60
INFO    - Using 'zephyr' toolchain.
INFO    - Selecting default platforms per test case
INFO    - Building initial testcase list...
INFO    - 1 test scenarios (39 configurations) selected, 8 configurations discarded due to filters.
INFO    - Adding tasks to the queue...
INFO    - Added initial list of jobs to queue
INFO    -  1/31 nsim_hs_smp               tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    -  2/31 qemu_leon3                tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    -  3/31 qemu_arc_hs6x             tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    -  4/31 native_posix              tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    -  5/31 qemu_nios2                tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    -  6/31 m2gl025_miv               tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    -  7/31 qemu_xtensa               tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    -  8/31 qemu_malta                tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    -  9/31 qemu_malta_be             tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    - 10/31 qemu_x86_nommu            tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    - 11/31 qemu_cortex_a9            tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace SKIPPED (filter)
INFO    - 12/31 qemu_riscv32              tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.195s)
INFO    - 13/31 qemu_riscv64              tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.074s)
INFO    - 14/31 qemu_x86_virt             tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.241s)
INFO    - 15/31 qemu_cortex_a53           tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.104s)
INFO    - 16/31 qemu_x86_nopae            tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.194s)
INFO    - 17/31 qemu_arc_em               tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.199s)
INFO    - 18/31 qemu_x86_nokpti           tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.391s)
INFO    - 19/31 qemu_arc_hs               tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.153s)
INFO    - 20/31 nsim_hs_mpuv6             tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (build)
INFO    - 21/31 qemu_x86_64               tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.363s)
INFO    - 22/31 qemu_x86                  tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.328s)
INFO    - 23/31 nsim_sem                  tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (build)
INFO    - 24/31 qemu_x86_tiny             tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.583s)
INFO    - 25/31 frdm_k64f                 tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (build)
INFO    - 26/31 qemu_cortex_r5            tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 6.588s)
INFO    - 27/31 qemu_cortex_a53_smp       tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.127s)
INFO    - 28/31 mps2_an385                tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.100s)
INFO    - 29/31 mps2_an521                tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.280s)
INFO    - 30/31 qemu_x86_64_nokpti        tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.504s)
INFO    - 31/31 mps2_an521_ns             tests/lib/newlib/thread_safety/libraries.libc.newlib.thread_safety.userspace PASSED (qemu 2.529s)

INFO    - 20 of 20 test configurations passed (100.00%), 0 failed, 19 skipped with 0 warnings in 56.27 seconds
INFO    - In total 160 test cases were executed, 152 skipped on 31 out of total 426 platforms (7.28%)
INFO    - 17 test configurations executed on platforms, 3 test configurations were only built.
INFO    - Saving reports...
INFO    - Writing xunit report /home/stephanos/Dev/zephyrproject/zephyr/twister-out/twister.xml...
INFO    - Writing xunit report /home/stephanos/Dev/zephyrproject/zephyr/twister-out/twister_report.xml...
INFO    - Run completed

@tejlmand tejlmand self-assigned this Jan 28, 2022
@dkalowsk dkalowsk added area: Build System priority: high High impact/importance bug labels Feb 1, 2022
@tejlmand tejlmand added priority: medium Medium impact/importance bug and removed priority: high High impact/importance bug labels Feb 3, 2022
@tejlmand tejlmand added the has-pr label Feb 3, 2022
@stephanosio stephanosio changed the title NEWLIBC + USERSPACE is broken Incremental build with config changes can produce an invalid binary Feb 3, 2022
@stephanosio stephanosio changed the title Incremental build with config changes can produce an invalid binary Incremental build with config changes can produce an invalid binary when userspace is enabled Feb 3, 2022
tejlmand added a commit to tejlmand/zephyr that referenced this issue Feb 3, 2022
Fixes: zephyrproject-rtos#42184

This commit fixes dependency issues related to kobject hash generation.

Absolute path is ensured in cases where OUTPUT was provided with
absolute path, as `${CMAKE_CURRENT_BINARY_DIR}`.
DEPENDS referring to same file has been updated to also use
`${CMAKE_CURRENT_BINARY_DIR}` to ensure they reference identical
locations.

The custom command renaming sections in kobject_hash.o and creating
kobject_hash_renamed.o has been updated to depends on the target objects
of kobj_hash_output_lib in addition to the library itself.
This is needed because kobj_hash_output_lib is not a real library, and
thus no library is updated when this target rebuilds.
Instead the dependency must be on the object files created by
kobj_hash_outplut_lib. This ensures that when object files gets rebuilt
then the section renaming will also take place.
The reason why both the object library itself, and its object files are
required as dependencies has to do with build generators.
The library is needed to ensure Makefiles can correctly have a target to
invoke when the output of the custom command is missing. The object
files dependency is required to ensure that custom commands are
correctly brought up-to-date when the objects changes.

Similar, the custom command executing gen_kobject_placeholders.py
depending of kobj_prebuilt_hash_output_lib has been updated to also
depend on the object files created by kobj_prebuilt_hash_output_lib.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
carlescufi pushed a commit that referenced this issue Feb 4, 2022
Fixes: #42184

This commit fixes dependency issues related to kobject hash generation.

Absolute path is ensured in cases where OUTPUT was provided with
absolute path, as `${CMAKE_CURRENT_BINARY_DIR}`.
DEPENDS referring to same file has been updated to also use
`${CMAKE_CURRENT_BINARY_DIR}` to ensure they reference identical
locations.

The custom command renaming sections in kobject_hash.o and creating
kobject_hash_renamed.o has been updated to depends on the target objects
of kobj_hash_output_lib in addition to the library itself.
This is needed because kobj_hash_output_lib is not a real library, and
thus no library is updated when this target rebuilds.
Instead the dependency must be on the object files created by
kobj_hash_outplut_lib. This ensures that when object files gets rebuilt
then the section renaming will also take place.
The reason why both the object library itself, and its object files are
required as dependencies has to do with build generators.
The library is needed to ensure Makefiles can correctly have a target to
invoke when the output of the custom command is missing. The object
files dependency is required to ensure that custom commands are
correctly brought up-to-date when the objects changes.

Similar, the custom command executing gen_kobject_placeholders.py
depending of kobj_prebuilt_hash_output_lib has been updated to also
depend on the object files created by kobj_prebuilt_hash_output_lib.

Signed-off-by: Torsten Rasmussen <Torsten.Rasmussen@nordicsemi.no>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Build System area: Kernel area: newlib Newlib C Standard Library area: Userspace Userspace bug The issue is a bug, or the PR is fixing a bug priority: medium Medium impact/importance bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants