-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Zbus: a message bus system #45910
Comments
#38611 how about this PR? |
#38611 A good PR but abandoned. |
Regarding alternatives - have you seen KBUS? http://kbus.readthedocs.io/en/latest/specification.html |
@henrikbrixandersen I was reading the documentation, and I noticed that several concepts are the same as I am purposing here, but this seems to be not a valid alternative because, as they said this is |
FWIW, #37223 is trying to implement something similar. |
Another similar zephyr-based IPC system: https://github.com/LairdCP/zephyr_framework |
@carlescufi and others, I superficially read the #38611, and I could not find a reason other than complexity to keep the Event Manager on NCS and Zephyr with the requested changes. Was that scenario changed? Is there another reason? It seems to be a powerful tool but has its complexities. Main differences I could catch:
I guess zbus is similar in features but simpler to use and maintain. In my own opinion, the bus metaphor is easier to understand and use than events. Developers will explicitly describe all the channels and subscribers in a centralized way (zbus_channels.h file). |
there seems to be lots of threads if in a complicated user case, this will waste lots of time while switch threads, just like sensor in a sampling rate to publish sensor data to other thread(modules) |
@lairdjm do you have a sample code using that? I would like to see that. thanks |
There is, https://github.com/LairdCP/BLE_Gateway_Firmware though the version of framework on github and that that application uses is a bit outdated, parts have been rewritten or improved and the files are now auto-generated from all the input files using cmake functions, a snapshot of the newer version can be seen on https://github.com/lairdjm/framework_zephyr (it needs another repository for the cmake functions which essentially just add files to lists) |
@ck-telecom You can use callbacks for that if you prefer. No threads are needed. It is just an example of code. If you take a look at the code, you can see in the integration test an example of using callbacks. If you need speed to transfer sensor data, you could use a pipe instead. I have added a sample to illustrate that. Take a look at the https://github.com/zephyr-bus/zbus/tree/main/samples/work_queue there you can see many ways of using that. |
@stephanosio change the prefix from |
I have updated the Example of use section with a more realistic example. I hope it helps. |
@zycz would you mind verifying if what I wrote in the comparison table (Alternatives section of RFC) regarding the Event Manager is true/correct? @lairdjm @stephanosio guys could you please summarize the features of your implementations taking the |
@rodrigopex Sure:
|
It is worth mentioning that App Event manager provides mechanism for adding extensions. There is a hook list to which you can add your own functionality. You can see examples how it is done in NCS app_event_manager_profiler_tracer or event_manager_proxy. Profiler was created to visualize event propagation in time and proxy allows to export events between cores. Only small changes in what @rodrigopex written about Event manager: Regarding Message allocation style, it's by default Dynamic (execution-time) but alloc function is "weak" so can be overwritten and static allocation can be used instead. |
@lairdjm Could you please check the text and clarify the marked points (with ???) in your column? Try to normalize with the other cells in the table. @zycz the adjustments were made, can you please double-check that? Thank you, guys. |
@rodrigopex Looks good! Additions:
|
Everything looks ok :) |
@lairdjm it seems to be asynchronous, right? If it uses message queues to transmit the data, the thread (subscriber/listener/receiver/consumer/whatever 😄) would choose when read that, right? |
@rodrigopex |
@hongshui3000 not yet, but I will add the necessary files to and let you know. |
Done! Zbus as a Zephyr module is ok. Check the Zbus as a module sample. @hongshui3000 if you need help, you can contact me on Discord. |
@rodrigopex good job . Thanks |
@rodrigopex - I'm just looking over the RFC now, and I wish I would have attended the API meeting. Just at a glance, it looks like messages are maybe fixed-size based on a C struct and use native byte ordering. Would be relatively easy to support other, possibly self-describing, serialization formats in Zbus? I would imagine that users might prefer to have some flexibility there. https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats |
@cfriedt thanks for your comments. I attended the meeting, we discussed a lot about zbus.
Yes, the message is a fixed-size struct or union to force the publishers and subscribers to use a channel properly. It will work as an API.
Sure, for free (but not perfect), you can use a struct with an array of bytes field, where you can store your data in any format.
During the Zbus design, I have considered using this kind of approach to define messages, but I was concerned about speed and memory usage. So I have decided to use the structs and unions. I am convinced that it is a good solution for now. Do you think using serialization format is better for message defining? My concer about it is the data's size variability and the need of encoders and decoders. It would increase the communication overhead. For me, it would be done by the application using a raw message type to carry the data. |
#38611 PR TO EX MODULE https://github.com/hongshui3000/event_manager |
I have implemented the 256000 benchmark for Event Manager. https://github.com/rodrigopex/event_manager/blob/main/samples/ |
I agree that for most use cases relatively small, statically allocated, fixed-size buffers are perfectly fine, but it would be nice if the API had the ability to use larger, variable-sized buffers, possibly dynamically allocated. Technically, a pointer to an arbitrary memory location could be stored inside the fixed-size buffer - I wonder if there is some kind of flag or enumeration that could be used in that case. Just curious - I love the Zbus concept, and feel it's absolutely a needed feature. I like that it's focused on remaining small as well, but of course some Zephyr users are at the complete opposite end of the spectrum and run on massive hyperscale infrastructure so having flexibility is important as well. Thanks for putting the RFC together :-) |
I modified the event manager so that it can support single-threaded development. Hope to improve the development efficiency of zephyr under single thread. https://github.com/hongshui3000/event_manager/tree/main/samples/sensor_measurement |
@cfriedt thank you for your comments and suggestions. That is possible for sure, and not hard do add. I guess it would be beneficial to the solution at all. What do you think about the following possible API? The Dynamic Channel definition would be like that: ZBUS_DYN_CHANNEL(
my_dyn_channel,
ZBUS_CHANNEL_SUBSCRIBERS(sub1, sub2, sub3)
) Allocating and freeing can be done externally by the user: struct user_data * user_allocated_data = (struct user_data*) malloc(struct user_data);
zbus_dyn_chan_alloc(my_dyn_channel, user_allocated_data, sizeof(struct user_data), K_MSEC(200));
void * mem_ref = NULL;
zbus_dyn_chan_dealloc(my_dyn_channel, &mem_ref, K_NO_WAIT); /* set the channel's pointer to NULL and the size to zero. Potential memory leak here */
free(mem_ref); Checking if it is already allocated would be like: zbus_dyn_chan_is_allocated(my_dyn_channel, K_MSEC(200)); Retreiving the channel's size would be like: size_t chan_size = zbus_dyn_chan_size(my_dyn_channel, K_MSEC(200)); Publishing and reading would be like: zbus_dyn_chan_pub(my_dyn_channel, orig, sizeof(orig), K_MSEC(200));
zbus_dyn_chan_read(my_dyn_channel, dest, sizeof(dest), K_MSEC(200)); Advanced APIBorrow reference of a dynamic channel. Total control over that: typedef struct {
void *msg_ref;
size_t msg_size;
} zbus_dyn_msg_t;
//...
zbus_dyn_msg_t msg_ref = {0};
zbus_dyn_chan_borrow(my_dyn_channel, msg_ref, K_MSEC(200)); // locks the channel to the other thread by taking the channel's semaphore
// Do what ever the user wants
zbus_dyn_chan_give_back(my_dyn_channel, K_MSEC(200)); // release the channel to the other thread by giving the channel's semaphore |
Now it is possible to make a channel dynamic. There is a sample showing how to do that: dyn_channel. I have added the possibility of claiming the channel’s message and performing some actions while it is claimed. After using that, the developer must call the finish function. The idea is similar to ring buffers. |
Another feature added: message validator. If you provide a validator function at the channel declaration, the publishing action will check if the message is valid. If it is not, the publishing will fail. |
I will definitively look forward to zbus getting into Zephyr. For event-driven designs this brings robustness(shared and tested infrastructure) and enables more systems to spontaneously grow in an event-driven direction within the Zephyr community. Nice work @rodrigopex ! |
This sounds very similar to something I worked with for 8 generations of the OWEN printer firmware product line at HP. Is anyone interested in learnings from that experience? |
Hi @gregshue. I am interested to learn that. How do you want to proceed? Discord or a call? |
Let's capture it here so others can consider it too.
This eventually proved to be a barrier to scalability & composability. The same will happen in Zephyr. Part of the problem is not having a clear definition of "user's threads". In my downstream repo each "application" has an empty For a lightly loaded system this can appear to work, but for a heavily loaded one we eventually experienced too much latency for our hard and soft real-time threads. We then recognized that a single announcer thread effectively forced unnecessary priority elevation for handling most channels. Eventually this design was abandoned in favor of components providing an instantiation of a common, extensible notification interface so that notifications where distributed at the priority of their source. Each subscriber already had to consolidate/resolve inputs from multiple sources while running at its assigned preemption priority - so components with threads had a private message queue and components without threads had a private mutex. This problem becomes more relevant for the Zephyr ecosystem. Because we are targeting reuse of code on multiple CPU architectures, we must design a solution for the minimum hardware functionality and the maximum feature set. In this case that means a CPU that has only one hardware interrupt preemption level and a system that needs to use that for catastrophic error handling (e.g., hardware watchdog warning). In this case all "IRQ" code must execute in thread context, which may get preempted by the messaging system. We found out it was important to minimize unnecessary complexity faced by the integrators.
Do you really mean "Inter-Process"? Zephyr doesn't have processes (address remapping). The closest is asymmetric multi-processing (different cores, possibly with some private and some shared memory). At HP we found that our composable architecture and internal interfaces had to be designed with some awareness of whether functionality was implemented on the local core or a remote one. This affected how "cancel" was designed, how long timeouts were allowed to be, and what steps needed to be take for recovery from partial reset. Before we abandoned the central messaging service design we figured out the impact of distributing the service across multiple (asymmetric) cores and what information must be shared between the two. The amount of unavoidable complexity was larger than expected. |
I think drivers and subsystems as "system threads." Maybe I am wrong, but I think as "user threads," the ones created explicitly with K_THREAD_DEFINE (or a similar approach). So maybe internally, between drivers and subsystems, the message bus does not fit in fact. Think that as D-Bus for Linux; we do not use D-Bus to talk to device drivers; we use IOCTLs, Netlink, or syscalls, right? Same here; only "applications" ("user threads") should use zbus.
The latency is a complicated part of communication. It depends on several variables. For example, maybe your MCU is too busy to guarantee a deadline; the threads' priorities are unbalanced, or someone made a bad communication mechanism choice. Nevertheless, the results were good in my tests, with a well-balanced priority and a reasonable publishing rate. Maybe it should be tested in more scenarios. If, in your case, you have a stream of bytes, it would be better to use pipes or message queues. The message bus provides a way for "user threads" to exchange messages (information or events).
Yes, I do. I guess you are thinking of "Inter-Processor." The message bus is not designed to solve "inter-processor" communication problems. This is still a big challenge, and I did not focus on that for now.
For simplicity matters, each core will have its zbus instance. Still, it is possible to use zbus' extensible capability to keep both synchronized using an IPC service (OpenAMP, RPMSG, and ICMSG). Again, you can implement an extension module to choose the information needed on both. The final thought I would like to share here is the message bus (zbus) is for Zephyr as D-Bus is for Linux keeping proportionalities of complexity. |
From previous experience, my entire printer solution would be in proprietary drivers + subsystems. My
Perhaps that is true in Linux, but in my prior experiences the RTOS systems all ran in kernel space, often avoided the UNISTD API (IOCTLS) and cooperative scheduling/time slices were never used. Looking at Zephyr, we also must support everything running in kernel space, device drivers are accessed via custom APIs rather than IOCTLs, and product code can be interspersed across any set of preemption priorities.
Our systems had to be designed to guarantee timing deadlines were met. Many parts of the system were actually best-effort, so no timing deadlines existed for those parts.
Actually, I thought you were thinking of "Inter-Processor." Zephyr does not have processes. It only has one flat address space with segmented access control. When present, the MMU is only used as a glorified MPU and does not do address translation.
I understand the model and the design. What I am sharing from years of experience with shipping products is having all notifications published through a single, high priority thread has been shown to be unnecessarily limiting in a scalable, composable RTOS-based platform like Zephyr. |
Looking at the alternatives table, I see the Zbus message definition approach is "Centralized, single file to describe channels and subscriptions". This is not acceptable. Zephyr ecosystem is already a modular, (somewhat) extensible architecture. Reuse is implied in the Zephyr mission statements objective. Extensibility is required, and a single-file definition is not extensible. I expect this is why all the other alternatives have a distributed definition. |
@gregshue sorry for that. The information in the RFC is outdated. Now it is possible to create channels in a distributed way. It is also possible to create independent modules with it. I will fix it and let you know. I have written the RFC before starting discussing with the community members. Several aspects of the solution have evolved since that. |
@gregshue, thanks to your sharings, now zbus has a "virtual distributed event dispatcher," which improved the solution in many ways. I understand the needs you are pointing out. Some of them we can address, but others don't. Let's say zbus is a message bus that helps threads talk to each other inside Zephyr. Sounds good? I will add a sample showing how to implement an independent module with zbus and let you know. You can tell me if it would be enough to keep composability. Thank you for your comments and suggestions. |
Zbus was merged #48509. |
FWIW, I was already using the name zbus for my IPC library since 2019. Fortunately, that's a Rust library so chances of confusion are low. :) |
Introduction
Embedded systems are everywhere, playing different roles in our lives. Embedded software grows in complexity as technology evolves over time. To keep up with the increasing complexity, practitioners and researchers put effort into launching numerous development solutions, such as frameworks, libraries, RTOSes, and so forth. Zephyr is one of these solutions. However, as with the majority of popular RTOSes (see table below), Zephyr has only a limited set of Inter-Process Communication (IPC) mechanisms. This makes development hard when dealing with multi-threaded systems. Because of that, developers sometimes add coupled code to decrease the number of threads of the solution. The communication between threads becomes a nightmare when the system requires a more complex inter-thread communication approach. For example, there is no straightforward way of making a decoupled many-to-many thread communication in Zephyr. Linux solves part of that problem using D-Bus, a message bus system, as a simple way for applications to talk to one another. I suggest an alternative message bus with a focus on speed, memory, and energy consumption.
Problem description
Based on my knowledge, Zephyr does not offer any IPC able to perform a many-to-many thread communication. There is no straightforward way of making a decoupled many-to-many thread communication in Zephyr. The closest IPC available for that is the mailbox, but it does not deliver the message to many in fact. It delivers to the first reader. Developers must reinvent the wheel all the time when they need many-to-many thread communication.
Proposed change
Add a fast and decoupled inter-process communication mechanism (message bus, which hereafter is referred to as
zbus
, from "Zephyr-bus") to Zephyr to enable a many-to-many communication model. Using this IPC, developers will be able to easily make thread communication even for a many-to-many model. Besides the communication capabilities, in the way it is being proposed, the bus will offer asynchronous, and structured communication with time, space, and synchronization decoupling between the communication entities. Even ISRs (Interrupt Service Routines) will be able to send messages through the bus. The bus, in the current implementation, is made by a static shared-memory portion, semaphores, and message queues working together.Detailed RFC
This RFC describes
zbus
, a message bus aimed to improve Zephyr threads' communication capabilities. It is implemented based on the publish/subscribe pattern. Message data is transmitted using managed shared-memory approach which provides well-suited performance. The figure below provides an overview ofzbus
.The operations that can be done by threads are only publishing, subscribing, and reading channels. Publish and read operations are done by a thread that wants to change or read the channel message, respectively. In order to know if a channel had its message changed, a thread can subscribe to a channel and receive this notification. Publishing and reading can be done in execution time, but the subscription must be done in compile time
Proposed change (Detailed)
The Figure below presents the internal details of
zbus
. For simplicity purposes, there is only an internal thread to keep track of the changes and notify the subscribers when some channel is published. The description of each component is as follows:The actions available for a thread in
zbus
are:zbus_index_<channel's name>
value) but not the message. The message reading is a discretionary action made by the subscriber. Sometimes the subscriber does not have an interest in reading the message, but wants to know if it changed. This is done to improve speed and make the thread with higher priority access the message first. All the threads are notified at the same time, but the reaction order of the threads is based on their priority;The type checking for publishing (the reading is similar) is done by the code illustrated as follows:
API description
To use
zbus
in its current implementation, the developer needs to define the messages and the channels by creating thezbus_messages.h
and thezbus_channels.h
files. The messages file must contain all types used to define the messages and the message definition. The channels are defined by setting some values of a macro as in the code as follows:The
<subscribers list>
is a list of subscribers' message queues. If we want a thread to receive notifications of a channel we need to pass a queue for that. The thread needs to check the queue for knowing the changing events.The
<initial message value>
can be a struct initialization or just a zero for default initialization. For exampleZBUS_INIT(.field1 = 10, .field2 = false)
. After the channel definition,zbus
will provide thezbus_channel_index_t
which is an enum with the channels' ids. The id generated to a channel iszbus_index_\<channel's name\>
of the typezbus_channel_index_t
. Event dispatcher uses these ids to send notifications.The publishing and reading actions are executed by calling the macro
zbus_chan_pub(<channel's name>, <message>, <timeout>)
andzbus_chan_read(<channel's name>, <message>, <timeout>)
. The fields description are:<channel's name>
: The name of the channel but with no ";<message value>
: The value of the channel's message, not its reference. The macro will do what is needed;<timeout>
: The regular Zephyr timeout. For ISR calls you must useK_NO_WAIT
.Example of use
This example illustrates the way the subscribers can react to a notification. In this case, we have an immediate callback, a work queue callback, and a thread consuming the notification. The sensors thread generates samples of data and publishes them to the channel
sensor_data
. All the subscribers receive the notification or execute callbacks directly from the event dispatcher (high priority). The event dispatcher must have the highest priority among the user's threads to guarantee a proper execution. However, the developer must be careful with the callback subscribers because they are executed in the event dispatcher context (with high priority). Another good approach, for low priority actions, is to use work queues to execute the action in fact. The callback only submitted the work when executed to avoid problems.Messages definition file. Here we have version and the sensor messages:
Channels definition file. Here we have the read-only
version
channel and thesensor_data
channel:Assuming a trivial implementation of a sensor thread that generates sensor samples and publishes them to the
sensor_data
channel:Assuming a trivial implementation of the subscribers in different approaches. They are based in a callback, a work queue and a thread:
The sequence of activities based on the code above is illustrated as follows.
The sequence starts with the sensors thread generating a sample and publishing it to the
sensor_data
channel. The bus immediately executes the callbacks, one of them would actually run another will submit work to the system work queue. After that callback, the system will notify the thread handler about the change by sending to it the id of thesensor_data
channel. Supposing the system work queue has a higher priority than the thread handler, it will execute the work. The thread handler wakes up, reads the content of the channel, and prints it. The described actions would run in a loop and repeat indefinitely.Benchmark
The benchmark was designed to transfer 256KB (262144 bytes) from the producer to the consumer. The only variable was the size of the channel's message, from 1 byte to 256 bytes. The board where the benchmark was executed is a hifive1_revb, Zephyr v3.0.0, and the code is in the repository samples.
Benefits
In this section I will describe some benefits of using
zbus
:zbus
for almost all of the communication needed. The only limitation here is performance, but for control communication, it seems to be enough (this needs more performance measurements).Extensibility
There is an extensible logic to the bus enabling that to be monitored and even replicated in different targets. It is possible to capture all the messages exchanged by the bus and inject messages as well. It is also possible to replicate the changes from one bus to another by using some interface like serial or BLE. It is a developer activity, for now, there is no code in
zbus
related to the replication process.Future work
I would imagine that using
zbus
will increase the abstraction and reusability of Zephyr threads. A set of correlated channels form a Port, which means this Port has all the APIs needed to use some thread (as a service). A device driver interface could be written usingzbus
, it would be easy to use, without adding extra driver API calls to the user code. The sensor driver API would be an example. The fetch, the data, and other related things could be channels. Image the code below could be real:Maybe all of the repetitive and error-prone initialization of devices could be done by the driver and started only by the DTS. No sensor API calls and no sensor initialization code are needed. Just the "service" enabled in DTS and everything running properly. It would be necessary to add an abstraction layer on top of the bus which initializes and manage the "BME280 sensor service". I did that for GNSS, in my tests, and I could change the GNSS module with no changes on the consumer side. The "GNSS service API" can still be the same, only the adapter had to change. I am from both industry and academy, I am a Ph.D. candidate right now, and my work is to define an architecture that enables great maintainability and abstraction by using some software engineering techniques, and zbus is part of that. This example of the sensor abstraction is a drop of that.
Dependencies
The implementation depends only on semaphores and message queues. The rest of the code is plain C and there is no dynamic allocation there.
Concerns and Unresolved Questions
The main concerns about the solution:
Alternatives
I could not find any direct alternative able to run in constrained devices with this kind of set of features.
Community alternatives suggestions:
The table below is a superficial comparison between suggested alternatives. This comparison possibly contains bias because I do not know or understand the Event Manager as I do for zbus.
Initial implementation
The current implementation is made using several preprocessor macros and possibly will be changed. This is a simple implementation and a proof of concept of the bus. You can take a look at the PoC code here.
References
The text was updated successfully, but these errors were encountered: