New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Static object constructors do not execute on the NATIVE_POSIX_64 target #39347
Comments
@palchak-google FYI |
Can you confirm that those four objects are actually in the resulting executable? What happens if you disable optimizations? When I built my original example, I had all optimizations disabled. I think what may be happening is that the compiler can see that those four static objects are never used and thus they're being elided. |
With configuration above all objects are present:
Below you can find 2 archives with following files: |
I am unable to reproduce this issue. I'm building with: OS: Linux I get the same output with
I do not have the ability at the moment to upgrade to GCC 11.2.1 to confirm whether the issue is related to the toolchain version (10.3.0 vs 11.2.1). Project files used: |
I am unable to reproduce this issue with GCC 10.3.1
and able to reproduce with GCC 11.2.1
For testing I used containers created from images:
|
Well we have ourselves a real head scratcher here. I was able to install GCC v11.2.0 last week, and when building with that toolchain I also failed to reproduce the issue (for either
As a sanity check, what happens if you build the executable without Zephyr at all? That is, change the signature of
I get the following output (as expected):
|
Also, since you happen to be using GCC-11, take note of issue #40023. |
This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time. |
Hello, Ubuntu 20.04 GCC 9.4 -> works perfect
Ubuntu 22.04 GCC 11.2 -> ctors are not executed
When I compiled and run this as normal application constructors works properly.
|
The differing results with different toolchains points indicates that this likely has something to do with a change in libstdc++. For reference, I used the following toolchain and library to build for
These version are all installed via official Debian packages. Re-building with the latest ToT for Zephyr again fails to reproduce for me:
|
I made some more checks in Ubuntu 22.04, looks like constructors works properly only in GCC 9.
For any other GCC (10, 11, 12) from ubuntu 22.04 constructors are not executed. In this example I used 10, but 11 and 12 works exactly the same.
Exactly the same behavior I observed for Nevertheless current behavior, when constructors are not executed directly after start of the application, and after reverting of this commit d10b46d is matching to this Enhancement #40444 |
Commit d10b46d addresses a defect in Zephyr reported in #39347. However, further investigation has revealed that d10b46d does not correctly resolve the root cause of the issue. Additionally, d10b46d is currently masking a second, latent defect in Zephyr. BackgroundTo understand the root cause of the issue, it's necessary to have familiarity with the start-up sequence for processes on Linux. This article provides a great explanation of each of the steps in detail. Root cause of #36858The root cause of #36858 lies in the interaction between the link script used by Zephyr for
Note here that the
The first word (0x03) and the last word (0x0) are inserted by Zephyr's link script (the
It's also important to note that there are two symbols in the binary, both named
The first symbol refers to a local object (the 'l' and 'O' flags) at address 0x080532cc, which is the address of the value -1 that was contributed by The final piece of the puzzle lies in the
The Root cause of #39347Commit d10b46d fixed the Zephyr start-up process on some systems but broke the start-up process on others. An understanding as to why is provided again by the map file. This excerpt is taken from the map file for the
On Aleksei's system, the
The other critical detail is that the
On Aleksei's sytem, the Different behavior on different systemsThe difference in process start-up behavior between my system and Aleksei's system can be attributed to significant changes that were introduced to glibc starting with version 2.34. Many of the changes are explained and motivated in this StackOverflow answer. The one change that has a critical impact on this issue is the elimination of the direct call to Without commit d10b46d, on systems with glibc < 2.34 the call to With commit d10b46d, on systems with glibc < 2.34 global constructors run exactly once prior to Zephyr's init, and on systems with glibc >= 2.34 global constructors never run at all. Glibc v2.34 was officially released in August of 2021, though some distributions adopted it earlier. It seems all but certain that Ubuntu 20.04 included a version of glibc prior older than 2.34 while Ubuntu 22.04 uses a version of glibc >= 2.34. This explains wojciechslenska's results. Fixing both #39347 and #36858The solution that works for both systems using glibc < 2.34 and systems using glibc >= 2.34 is to revert commit d10b46d and instead modify the Zephyr link script (for The first list contains only the symbols provided by the The second list contains all of the remaining symbols from the Finally, one important feature of this solution that is not present in commit d10b46d is that with this solution static object constructors run as part of Zephyr's start-up routine (instead of before Zephyr). This means that static object constructors run after kernel and driver initialization, exactly as occurs on non- I will work on producing a patch that implements this solution in the coming weeks. If someone else wants to take a shot at implementation in the mean time, please do. CodaAt the beginning I mentioned that commit d10b46d is currently masking a latent defect in Zephyr. Recall the relevant map file section for and contents of the
The Looking closely at the initialization function list in the final binary we see that it contains five entries:
The first entry indicates that the list contains three valid pointers, and the last entry is the trailing zero. However, entry[1], which is supposedly a function pointer, has the value -1. Similarly, entry[3] is a nullptr. Only the entry[2] is a valid function pointer. What Zephyr's link script has done is take a valid list (entries 1-3) and wrapped with with a leading size (0x03) and a trailing zero, resulting in a final list that contains invalid pointers. And indeed, if commit d10b46d is reverted, Zephyr's version of
Notice in this output that even though Zephyr generates a
The executable contains two symbols named In contrast, Zephyr's version of Because the proposed fix for #39347 and #36858 will also eliminate this latent defect, it doesn't seem necessary to file a separate bug report for it. However, if others disagree, then please feel free to copy this coda into a stand-alone report. |
Rename the symbols used to denote the locations of the global constructor lists and modify the Zephyr start-up code accordingly. On POSIX systems this ensures that the native libc init code won't find any constructors to run before Zephyr loads. Fixes zephyrproject-rtos#39347, zephyrproject-rtos#36858 Signed-off-by: David Palchak <palchak@google.com>
@aleksei-timofeyev I have submitted a fix for this issue. It works as best as I have been able to test. Will you please confirm the fix works properly on your system(s)? |
Rename the symbols used to denote the locations of the global constructor lists and modify the Zephyr start-up code accordingly. On POSIX systems this ensures that the native libc init code won't find any constructors to run before Zephyr loads. Fixes #39347, #36858 Signed-off-by: David Palchak <palchak@google.com>
@palchak-google
and GCC 12.0.1
I see the same behavior on |
Rename the symbols used to denote the locations of the global constructor lists and modify the Zephyr start-up code accordingly. On POSIX systems this ensures that the native libc init code won't find any constructors to run before Zephyr loads. Fixes zephyrproject-rtos#39347, zephyrproject-rtos#36858 Signed-off-by: David Palchak <palchak@google.com>
Describe the bug
All C++ global object constructors are not executed upon startup of a NATIVE_POSIX_64 executable. This breaks code that relies upon constructor side effects (such as registering unit tests with a test framework).
To Reproduce
Steps to reproduce the behavior:
main.cc
:native_posix_64
target with followingKconfig
:Expected behavior
The following lines should be printed to the terminal exactly once:
Impact
This bug results in a major discrepancy between the behavior of code running on the native_posix_64 board and the behavior of that same code running on physical board. This bug also interferes with the use of common unit test frameworks that rely on auto-registration of tests via global object constructors.
Environment (please complete the following information):
Additional context
The bug first appears at commit d10b46d.
#36858
The text was updated successfully, but these errors were encountered: