Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib/os: Add sys_winstream lockless shared memory byte stream IPC #41521

Merged
merged 3 commits into from Jan 13, 2022

Conversation

andyross
Copy link
Contributor

@andyross andyross commented Jan 1, 2022

[Pull request adds the IPC mechanism, and also support for it as the default trace output in cAVS platforms]

It's not uncommon to have Zephyr running in environments where it shares a memory bus with a foreign/non-Zephyr system (both the older Intel Quark and cAVS audio DSP systems share this property). In those circumstances, it would be nice to have a utility that allows an arbitrary-sized chunk of that memory to be used as a unidirectional buffered byte stream without requiring complicated driver support. sys_winstream is one such abstraction.

This code is lockless, it makes no synchronization demands of the OS or hardware beyond memory ordering[1]. It implements a simple file/socket-style read/write API. It produces small code and is high performance (e.g. a read or write on Xtensa is about 60 cycles plus one per byte copied). It's bidirectional, with no internal Zephyr dependencies (allowing it to be easily ported to the foreign system). And it's quite a bit simpler (especially for the reader) than the older cAVS trace protocol it's designed to replace.

[1] Which means that right now it won't work reliably on arm64 until we add a memory barrier framework to Zephyr! See notes in the code; the locations for the barriers are present, but there's no utility to call.

@andyross andyross requested review from lyakh and kv2019i January 1, 2022 00:15
@github-actions github-actions bot added area: API Changes to public APIs area: Boards area: Tests Issues related to a particular existing or missing test labels Jan 1, 2022
@zephyrbot zephyrbot added area: Build System area: Base OS Base OS Library (lib/os) platform: Intel ADSP Intel Audio platforms labels Jan 4, 2022
@@ -66,6 +66,8 @@

#define MANIFEST_SEGMENT_COUNT 3

extern void soc_trace_init(void);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The extern keyword on a function declaration is ignored by the compiler and can be safely omitted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. It's a really, really common idiom in the kernel though, so it's in my fingers. I guess I've grown to like it for uses like this ("declare a function no one put in a header"). Though in this particular case it's an artifact of the fact that this bootloader code used to be built entirely separately and didn't share headers. At this point there's no reason not to just move this to soc.h.

Copy link
Collaborator

@marc-hb marc-hb Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Harbison and Steele 4.8.5 recommends the use of extern to be compatible with the largest number of toolchains. Is there some documented Zephyr policy about this? I know Zephyr has a complex CMake toolchain abstraction layer which seems aimed at supporting a wide range of toolchains, some possibly "less standard" ones.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could move this to a header then that would be great and this discussion is moot.

To continue this discussion, about what should be policy, and not about this particular instance could be interesting.

That is interesting @march-hb. I was not aware that toolchains can have trouble with this. But I have also never seen any patch to this effect, adding extern to fix a toolchain issue, which one would assume would have happened by now if any third-party toolchain building Zephyr did have this problem. So I believe this is not an issue, but time may prove me wrong ...

IMHO it should be policy to not use it.

Assuming I am right and that no toolchain building Zephyr will ever have a problem with this, then it acts only as documentation to the reader, which IMHO would be better served with a comment.

// We declare this external function here instead of including it's header becase ...

@@ -24,6 +24,7 @@ zephyr_sources(
heap-validate.c
bitarray.c
multi_heap.c
winstream.c
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add a Kconfig option for this so we don't slow down the build for all that don't use this.

Copy link
Contributor Author

@andyross andyross Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy enough. But FWIW: I don't know that I buy that analysis. C file builds are wildly parallel on almost all the environments we care about. Extra CMake logic ties up a single core before any compiler can be launched at all, and it's already dominating the build. A quick test on my box:

$ time cmake -B build -GNinja -DBOARD=qemu_x86_64 samples/hello_world
...
real	0m1.474s
user	0m1.286s
sys	0m0.173s

$ time (cd build; ninja)
...
real	0m0.799s
user	0m2.516s
sys	0m0.955s

Again, not like it matters here. But IMHO we need to be spending our development optimization effort trying to reduce the amount of cmake usage, not increase it.

Copy link
Collaborator

@marc-hb marc-hb Jan 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A big problem with additional Kconfigs is combinatorial explosion. I mean many Kconfig combinations are never tested, it's frequent some combinations don't even compile! kconfig.c has a very useful "randconfig" feature but the orphaned kconfig.py used by Zephyr has not.

To speed up compilation from scratch I think cleaning up #includes (#41543) seems more effort but also much more promising because affecting many .c files at once. #41543 is also important "modularity hygiene".

The performance of incremental C compilation is not a problem because it's... incremental and highly parallel. As mentioned by @andyross it's entirely dominated by CMake itself and/or the various linking scripts and post-processing.

I think more Kconfig matters but for a very different optimization: space. In SOF we very very often run out of space on some platforms. Would an additional Kconfig here make a difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding space: not really. With only a tiny handful of edge cases ("stuff with SYS_INIT() initialization calls doesn't garbage collect" being the big one), everything we link uses -fdata/function-sections and gets included in the final binary only if actually referenced.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In other words, turning off tracing with already existing Kconfigs is enough for this code to be removed at link time? Just making sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though (getting into DSP specifics) that only gets rid of the code. The 8k for the trace buffer in window 3 is in the "reserved" block in the bottom of memory that can't be used architecturally for anything else (SOF puts its own stuff in that region too and all the addresses are hard-coded). I'm hoping to get that fixed in the medium-term future so those regions can be seen by the linker and resized/eliminated/moved, there were some discussions about this in @lyakh 's PR to the internal Zephyr tree.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, though (getting into DSP specifics) that only gets rid of the code. The 8k for the trace buffer in window 3 is in the "reserved" block in the bottom of memory that can't be used architecturally for anything else

So this means right now it's reserved with or without this PR, the latter does not make any difference?

Not familiar with link-time optimizations and not sure about your longer term plans for this but in general maybe any tracing buffer common to various tracing features could be under the control of a new but invisible kconfig symbol which does not contribute to combinatorial explosion because it's not a free variable.
https://docs.zephyrproject.org/2.4.0/guides/kconfig/setting.html#visible-and-invisible-kconfig-symbols

Invisible symbols are not shown in the interactive configuration interfaces, and users have no direct control over their value. They instead get their value from defaults or from other symbols.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK with a promptless AKA invisible Kconfig symbol.

I don't agree with your analysis of the build-time @andyross .

I measure the gcc invocation to be 33ms. Divided by 8 cores we get 4ms.
Note that adding 2 functions to an existing C file would be OK, as it is the invocation
itself that is costly.

We can do a conservative estimate of the Kconfig+CMake runtime.

CMake is practically 0. if(CONFIG_FOO) should take a handful of nanoseconds.

There are 1600 Kconfig source files. If adding one Kconfig symbol
slowed down the build by 4ms, then processing 1600 Kconfig files
(assuming a conservative average of one symbol per source file)
should take 1600 * 4 = 6.4s, which it obviously doesn't.

So the overhead of adding a symbol in Kconfig must be much lower than
4ms, maybe 0.1ms.

In addition to the slowdown there is also noise in the build logs to consider. Many find it useful
to examine the build logs to see what is included in the build.

4ms may not seem like a lot, but OKing this as a policy will make this number grow significantly.

Build systems can be slow with no config required (just include everything and let dynamic dependency tracking sort it out), or fast with explicit configuration. I'm sure you have worked on a slow build system and can appreciate
the build being fast.

Copy link
Collaborator

@SebastianBoe SebastianBoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes requested

@carlescufi carlescufi requested review from carlocaione and hubertmis and removed request for tejlmand and galak January 11, 2022 10:51
@andyross
Copy link
Contributor Author

Rebase and fix collision. This should be good to merge at this point; stakeholders have all seen it and the risk is low.

@andyross
Copy link
Contributor Author

Hm... that ringbuffer build failure is an upstream thing with ztest, it looks like. Not sure why that didn't get caught by CI elsewhere? Definitely happens on origin/main for me.

Andy Ross added 3 commits January 13, 2022 10:58
It's not uncommon to have Zephyr running in environments where it
shares a memory bus with a foreign/non-Zephyr system (both the older
Intel Quark and cAVS audio DSP systems share this property).  In those
circumstances, it would be nice to have a utility that allows an
arbitrary-sized chunk of that memory to be used as a unidirectional
buffered byte stream without requiring complicated driver support.
sys_winstream is one such abstraction.

This code is lockless, it makes no synchronization demands of the OS
or hardware beyond memory ordering[1].  It implements a simple
file/socket-style read/write API.  It produces small code and is high
performance (e.g. a read or write on Xtensa is about 60 cycles plus
one per byte copied).  It's bidirectional, with no internal Zephyr
dependencies (allowing it to be easily ported to the foreign system).
And it's quite a bit simpler (especially for the reader) than the
older cAVS trace protocol it's designed to replace.

[1] Which means that right now it won't work reliably on arm64 until
we add a memory barrier framework to Zephyr!  See notes in the code;
the locations for the barriers are present, but there's no utility to
call.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
The newer sys_winstream utility is considerably simpler and much
faster for the reader.  Use that instead.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
This is the matching commit to the previous one that swaps the
protocol used for window logging for sys_winstream.

The advantage is especially clear for the reader.  The old protocol
was complicated and race-prone, requiring whole-buffer reads for
reliability.  The new one is tiny and fast.

Signed-off-by: Andy Ross <andrew.j.ross@intel.com>
@andyross
Copy link
Contributor Author

Repush to clear out the ztress failure

@nashif nashif merged commit 8cb5bf1 into zephyrproject-rtos:main Jan 13, 2022
/* Make room in the buffer by advancing start first (note same
* len-1 from above)
*/
len = MIN(len, ws->len);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don't need this line 57. Cause len is always less than ws->len, guaranteed by previous truncation check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: API Changes to public APIs area: Base OS Base OS Library (lib/os) area: Boards area: Build System area: Tests Issues related to a particular existing or missing test platform: Intel ADSP Intel Audio platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants