-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add new ALM transport #89
Conversation
I'm seeing compiler errors: ```
|
Focussed on the new dev guide for now. That's nice, certainly very helpful. Couple suggestions:
I'll leave some more smaller comments/questions on the RST code. |
Thanks for the feedback! Just a quick note (content updates in progress):
Sorry, the 3rdparty submodule was lagging behind. Should work now. |
@rsmmr I've integrated your feedback on the devs section. Let me know what you think of the additions / changes and how I can improve the section further. 🙂 |
Nice, thanks for the updates to the devs section. The ALM description sounds all good, and the architecture section is quite helpful - I'm feeling like I'm starting to understand Broker finally ;-) |
(... and no further particular comments) |
8bc3796
to
d8d1db8
Compare
A quick update, I've wanted to share: while some unit tests still fail (working on it), the ALM branch runs stable enough for a first quick performance comparison with current master. I've used the
For the quick overview, I've let it run for some time, then took 30 values (output of the server for received messages per second) each. Here are the results:
I wouldn't read too much into this, since the values do fluctuate. However, the take-away is that the new source routing seems to have no negative effect on the performance. Here is the raw data I've compiled this from: 2020-03-28 Broker ALM branch comparison.txt. Of course, most insights will come when looking at a full cluster. Once I have ported the new cluster benchmark, I'll do a more thorough comparison. |
@rsmmr I was thinking about a path forward for this PR. I think as it stands, this is hard to review / integrate in its entirety. The branch contains several big changes to the entire code base plus a new communication backend. I think there are two options we could choose: 1) do some incremental reviews (you've already did one) and eventually merge everything at once or 2) separate refactoring from actual new features with individual PRs. For option two, I would factor out at least these two steps as separate PR:
Personally, I favor option 2. Aside from making reviews more manageable and focused, merging individual parts earlier also helps to avoid this PR running out of sync. |
Yeah, let's go with Option 2, and merge the refactoring & static notifications first. If you can split things out further into more, smaller chunks, that would be worth the effort; both for the refactoring and then for the new functionality. We'll review each to the degree we can, with a particular focus on not breaking existing cluster topologies. Also, let's remain flexible on timeline: The closer we get to the release, the more risky a merge of complex changes will be. Depending on how things progress, a viable model could be getting the foundational commits in before 2.2, and then the new logic after the release early in the next cycle. Let's see. |
94c4c4b
to
7417b33
Compare
d227221
to
bdc21b5
Compare
Squashed commit, the original sketch for the new routing table design started Dec 9, 2019.
Squashed commit, the first steps toward the new ALM layer started Dec 9, 2019.
7669595
to
059cc4f
Compare
By using a common prefix for all actors in Broker, we can more easily enable metrics collection and also make it easier to find relevant actors in log output.
After re-integrating the upstream changes, I did some benchmarking with As a baseline, this is 5 Minutes for the current With the ALM branch, I can see performance go down to ~80k events per second: The "Message Processing Time" is a heat map. With metrics enables, CAF measures how long an actor takes to process its input messages and then puts each sample into its (configurable) bucket. On the Some computational overhead is unavoidable with an overlay, but I'll see where Broker spends most of the extra time now to see if we can reduce the cost by choosing different data structures or algorithms. The measurements above were using the default message type in Due to much larger data per element, The ALM branch is still slower, hovering above 30k events per second: The gap isn't as big this time, since the per-element overhead is smaller when compared to the overall cost for sending larger elements. Even in this version, we can see that a larger percentage of elements is in the >10µs bucket. Since a real Zeek cluster is probably closer to the second run, the ALM version of Broker wouldn't cut max performance in half. This setup only considers a one to one setup at the moment to look at peak performance. I think ideally we get the ALM performance a bit closer to the current Broker version or at least understand the tradeoffs we make better (i.e., where we are losing the time and whether it's worth it) before merging / integrating into Zeek. |
Broker now contains a new Just optimizing memory allocation strategies and merging the receiver list into the multipath got me back to slightly above 100k events per second. @rsmmr suggested maybe Broker could do some clever batching. Broker already does batching implicitly via the CAF streams, so I'd be cautious to add another batching layer on top (especially since Zeek already does another level batching on top of Broker). However, the suggestion got me thinking where performance benefits of batching done by Broker might come from. Essentially, Broker could avoid redundant data on the wire by first sending the source routing information and then the data. If we look at a Broker batch, it looks somewhat like this (ignoring command messages for simplicity):
To CAF, a batch is simply a |
As discussed with @rsmmr today, we are going to make one more change before this is ready to merge: get rid of the remote actor indirections and instead have Broker exchange message directly via a custom protocol on the wire. Closing until then. |
Currently, Broker requires its users to provide loop-free deployments. The new ALM transport makes it much easier to deploy Broker, since Broker no longer simply floods published data but instead discovers and manages available paths.
I've added a new
doc/devs.rst
file to the documentation to cover how Broker works internally as well as how the (new) code is structured. At this point, feedback to this would be most welcomed.Remaining ToDos:
Implement re-attaching of clones on message loss(not essential, implement as followup)