New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib/os: Add sys_winstream lockless shared memory byte stream IPC #41521
Conversation
@@ -66,6 +66,8 @@ | |||
|
|||
#define MANIFEST_SEGMENT_COUNT 3 | |||
|
|||
extern void soc_trace_init(void); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The extern keyword on a function declaration is ignored by the compiler and can be safely omitted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. It's a really, really common idiom in the kernel though, so it's in my fingers. I guess I've grown to like it for uses like this ("declare a function no one put in a header"). Though in this particular case it's an artifact of the fact that this bootloader code used to be built entirely separately and didn't share headers. At this point there's no reason not to just move this to soc.h.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harbison and Steele 4.8.5 recommends the use of extern
to be compatible with the largest number of toolchains. Is there some documented Zephyr policy about this? I know Zephyr has a complex CMake toolchain abstraction layer which seems aimed at supporting a wide range of toolchains, some possibly "less standard" ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you could move this to a header then that would be great and this discussion is moot.
To continue this discussion, about what should be policy, and not about this particular instance could be interesting.
That is interesting @march-hb. I was not aware that toolchains can have trouble with this. But I have also never seen any patch to this effect, adding extern to fix a toolchain issue, which one would assume would have happened by now if any third-party toolchain building Zephyr did have this problem. So I believe this is not an issue, but time may prove me wrong ...
IMHO it should be policy to not use it.
Assuming I am right and that no toolchain building Zephyr will ever have a problem with this, then it acts only as documentation to the reader, which IMHO would be better served with a comment.
// We declare this external function here instead of including it's header becase ...
lib/os/CMakeLists.txt
Outdated
@@ -24,6 +24,7 @@ zephyr_sources( | |||
heap-validate.c | |||
bitarray.c | |||
multi_heap.c | |||
winstream.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add a Kconfig option for this so we don't slow down the build for all that don't use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easy enough. But FWIW: I don't know that I buy that analysis. C file builds are wildly parallel on almost all the environments we care about. Extra CMake logic ties up a single core before any compiler can be launched at all, and it's already dominating the build. A quick test on my box:
$ time cmake -B build -GNinja -DBOARD=qemu_x86_64 samples/hello_world
...
real 0m1.474s
user 0m1.286s
sys 0m0.173s
$ time (cd build; ninja)
...
real 0m0.799s
user 0m2.516s
sys 0m0.955s
Again, not like it matters here. But IMHO we need to be spending our development optimization effort trying to reduce the amount of cmake usage, not increase it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big problem with additional Kconfigs is combinatorial explosion. I mean many Kconfig combinations are never tested, it's frequent some combinations don't even compile! kconfig.c has a very useful "randconfig" feature but the orphaned kconfig.py used by Zephyr has not.
To speed up compilation from scratch I think cleaning up #includes (#41543) seems more effort but also much more promising because affecting many .c files at once. #41543 is also important "modularity hygiene".
The performance of incremental C compilation is not a problem because it's... incremental and highly parallel. As mentioned by @andyross it's entirely dominated by CMake itself and/or the various linking scripts and post-processing.
I think more Kconfig matters but for a very different optimization: space. In SOF we very very often run out of space on some platforms. Would an additional Kconfig here make a difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding space: not really. With only a tiny handful of edge cases ("stuff with SYS_INIT() initialization calls doesn't garbage collect" being the big one), everything we link uses -fdata/function-sections and gets included in the final binary only if actually referenced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, turning off tracing with already existing Kconfigs is enough for this code to be removed at link time? Just making sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though (getting into DSP specifics) that only gets rid of the code. The 8k for the trace buffer in window 3 is in the "reserved" block in the bottom of memory that can't be used architecturally for anything else (SOF puts its own stuff in that region too and all the addresses are hard-coded). I'm hoping to get that fixed in the medium-term future so those regions can be seen by the linker and resized/eliminated/moved, there were some discussions about this in @lyakh 's PR to the internal Zephyr tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, though (getting into DSP specifics) that only gets rid of the code. The 8k for the trace buffer in window 3 is in the "reserved" block in the bottom of memory that can't be used architecturally for anything else
So this means right now it's reserved with or without this PR, the latter does not make any difference?
Not familiar with link-time optimizations and not sure about your longer term plans for this but in general maybe any tracing buffer common to various tracing features could be under the control of a new but invisible kconfig symbol which does not contribute to combinatorial explosion because it's not a free variable.
https://docs.zephyrproject.org/2.4.0/guides/kconfig/setting.html#visible-and-invisible-kconfig-symbols
Invisible symbols are not shown in the interactive configuration interfaces, and users have no direct control over their value. They instead get their value from defaults or from other symbols.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with a promptless AKA invisible Kconfig symbol.
I don't agree with your analysis of the build-time @andyross .
I measure the gcc invocation to be 33ms. Divided by 8 cores we get 4ms.
Note that adding 2 functions to an existing C file would be OK, as it is the invocation
itself that is costly.
We can do a conservative estimate of the Kconfig+CMake runtime.
CMake is practically 0. if(CONFIG_FOO) should take a handful of nanoseconds.
There are 1600 Kconfig source files. If adding one Kconfig symbol
slowed down the build by 4ms, then processing 1600 Kconfig files
(assuming a conservative average of one symbol per source file)
should take 1600 * 4 = 6.4s, which it obviously doesn't.
So the overhead of adding a symbol in Kconfig must be much lower than
4ms, maybe 0.1ms.
In addition to the slowdown there is also noise in the build logs to consider. Many find it useful
to examine the build logs to see what is included in the build.
4ms may not seem like a lot, but OKing this as a policy will make this number grow significantly.
Build systems can be slow with no config required (just include everything and let dynamic dependency tracking sort it out), or fast with explicit configuration. I'm sure you have worked on a slow build system and can appreciate
the build being fast.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes requested
Rebase and fix collision. This should be good to merge at this point; stakeholders have all seen it and the risk is low. |
Hm... that ringbuffer build failure is an upstream thing with ztest, it looks like. Not sure why that didn't get caught by CI elsewhere? Definitely happens on origin/main for me. |
It's not uncommon to have Zephyr running in environments where it shares a memory bus with a foreign/non-Zephyr system (both the older Intel Quark and cAVS audio DSP systems share this property). In those circumstances, it would be nice to have a utility that allows an arbitrary-sized chunk of that memory to be used as a unidirectional buffered byte stream without requiring complicated driver support. sys_winstream is one such abstraction. This code is lockless, it makes no synchronization demands of the OS or hardware beyond memory ordering[1]. It implements a simple file/socket-style read/write API. It produces small code and is high performance (e.g. a read or write on Xtensa is about 60 cycles plus one per byte copied). It's bidirectional, with no internal Zephyr dependencies (allowing it to be easily ported to the foreign system). And it's quite a bit simpler (especially for the reader) than the older cAVS trace protocol it's designed to replace. [1] Which means that right now it won't work reliably on arm64 until we add a memory barrier framework to Zephyr! See notes in the code; the locations for the barriers are present, but there's no utility to call. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The newer sys_winstream utility is considerably simpler and much faster for the reader. Use that instead. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This is the matching commit to the previous one that swaps the protocol used for window logging for sys_winstream. The advantage is especially clear for the reader. The old protocol was complicated and race-prone, requiring whole-buffer reads for reliability. The new one is tiny and fast. Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
Repush to clear out the ztress failure |
/* Make room in the buffer by advancing start first (note same | ||
* len-1 from above) | ||
*/ | ||
len = MIN(len, ws->len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we don't need this line 57. Cause len
is always less than ws->len
, guaranteed by previous truncation check.
[Pull request adds the IPC mechanism, and also support for it as the default trace output in cAVS platforms]
It's not uncommon to have Zephyr running in environments where it shares a memory bus with a foreign/non-Zephyr system (both the older Intel Quark and cAVS audio DSP systems share this property). In those circumstances, it would be nice to have a utility that allows an arbitrary-sized chunk of that memory to be used as a unidirectional buffered byte stream without requiring complicated driver support. sys_winstream is one such abstraction.
This code is lockless, it makes no synchronization demands of the OS or hardware beyond memory ordering[1]. It implements a simple file/socket-style read/write API. It produces small code and is high performance (e.g. a read or write on Xtensa is about 60 cycles plus one per byte copied). It's bidirectional, with no internal Zephyr dependencies (allowing it to be easily ported to the foreign system). And it's quite a bit simpler (especially for the reader) than the older cAVS trace protocol it's designed to replace.
[1] Which means that right now it won't work reliably on arm64 until we add a memory barrier framework to Zephyr! See notes in the code; the locations for the barriers are present, but there's no utility to call.