Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new ALM transport #89

Open
wants to merge 72 commits into
base: master
from
Open

Conversation

@Neverlord
Copy link
Member

@Neverlord Neverlord commented Feb 16, 2020

Currently, Broker requires its users to provide loop-free deployments. The new ALM transport makes it much easier to deploy Broker, since Broker no longer simply floods published data but instead discovers and manages available paths.

I've added a new doc/devs.rst file to the documentation to cover how Broker works internally as well as how the (new) code is structured. At this point, feedback to this would be most welcomed.

Remaining ToDos:

  • Extend the routing table to store paths rather than distances
  • Implement source routing
  • Add path revocation to propagate loss of connectivities
  • Implement ordering & loss detection (for data stores)
  • Implement re-attaching of clones on message loss
@rsmmr
Copy link
Member

@rsmmr rsmmr commented Feb 24, 2020

I'm seeing compiler errors: ```
/Users/robin/bro/master/aux/broker/build/caf-build/libcaf_core/sec_strings.cpp:105:15: error: no member named 'remote_lookup_failed' in
'caf::sec'; did you mean 'remote_linking_failed'?
case sec::remote_lookup_failed:
~~~~~^~~~~~~~~~~~~~~~~~~~
remote_linking_failed
/Users/robin/bro/master/aux/broker/build/caf-build/libcaf_core/caf/sec.hpp:92:3: note: 'remote_linking_failed' declared here
remote_linking_failed,
^
/Users/robin/bro/master/aux/broker/build/caf-build/libcaf_core/sec_strings.cpp:105:10: error: duplicate case value 'remote_linking_failed'
case sec::remote_lookup_failed:
^
/Users/robin/bro/master/aux/broker/build/caf-build/libcaf_core/sec_strings.cpp:69:10: note: previous case defined here
case sec::remote_linking_failed:
^
2 errors generated.


Submodules are up to date, am I missing something else?
@rsmmr
Copy link
Member

@rsmmr rsmmr commented Feb 24, 2020

Focussed on the new dev guide for now. That's nice, certainly very helpful. Couple suggestions:

  • I'd turn the sections around and start with the architecture, then the ALM can refer back to that as helpful.

  • A couple diagrams would be helpful: one for the architecture showing the main components and data flows (core actor, mixins, etc.; also where it goes down into CAF); and one (or two) for the flow of messages, ideally with an example of what happens as they get forwarded.

I'll leave some more smaller comments/questions on the RST code.

doc/devs.rst Show resolved Hide resolved
doc/devs.rst Outdated Show resolved Hide resolved
doc/devs.rst Outdated Show resolved Hide resolved
doc/devs.rst Outdated Show resolved Hide resolved
doc/devs.rst Outdated Show resolved Hide resolved
doc/devs.rst Outdated Show resolved Hide resolved
doc/devs.rst Outdated Show resolved Hide resolved
doc/devs.rst Show resolved Hide resolved
doc/devs.rst Outdated Show resolved Hide resolved
@Neverlord
Copy link
Member Author

@Neverlord Neverlord commented Feb 25, 2020

Thanks for the feedback! Just a quick note (content updates in progress):

Submodules are up to date, am I missing something else?

Sorry, the 3rdparty submodule was lagging behind. Should work now.

@Neverlord
Copy link
Member Author

@Neverlord Neverlord commented Mar 15, 2020

@rsmmr I've integrated your feedback on the devs section. Let me know what you think of the additions / changes and how I can improve the section further. 🙂

@rsmmr
Copy link
Member

@rsmmr rsmmr commented Mar 18, 2020

Nice, thanks for the updates to the devs section. The ALM description sounds all good, and the architecture section is quite helpful - I'm feeling like I'm starting to understand Broker finally ;-)

@rsmmr
Copy link
Member

@rsmmr rsmmr commented Mar 18, 2020

(... and no further particular comments)

@Neverlord Neverlord force-pushed the topic/neverlord/multi-hop-routing branch from 8bc3796 to d8d1db8 Mar 27, 2020
@Neverlord
Copy link
Member Author

@Neverlord Neverlord commented Mar 28, 2020

A quick update, I've wanted to share: while some unit tests still fail (working on it), the ALM branch runs stable enough for a first quick performance comparison with current master. I've used the broker-benchmark tool with Broker compiled as Release version:

broker-benchmark -r 100000000 -s 1 -t 2 localhost:9090

For the quick overview, I've let it run for some time, then took 30 values (output of the server for received messages per second) each. Here are the results:

Branch Median Average
master 31,588.5 30,992.6
ALM 32,471.0 32,368.5

I wouldn't read too much into this, since the values do fluctuate. However, the take-away is that the new source routing seems to have no negative effect on the performance.

Here is the raw data I've compiled this from: 2020-03-28 Broker ALM branch comparison.txt.

Of course, most insights will come when looking at a full cluster. Once I have ported the new cluster benchmark, I'll do a more thorough comparison.

@Neverlord
Copy link
Member Author

@Neverlord Neverlord commented Apr 24, 2020

@rsmmr I was thinking about a path forward for this PR. I think as it stands, this is hard to review / integrate in its entirety. The branch contains several big changes to the entire code base plus a new communication backend. I think there are two options we could choose: 1) do some incremental reviews (you've already did one) and eventually merge everything at once or 2) separate refactoring from actual new features with individual PRs.

For option two, I would factor out at least these two steps as separate PR:

  • The switch to a mixin-based design. This encompasses the first chunk of commits minus the routing table.
  • Extending status with the new codes endpoint_discovered and endpoint_unreachable. This affected more code than I originally though (almost done with that refactoring). While they'd be unused in master for a while, I think we could still merge this change ahead of the actual ALM implementation.

Personally, I favor option 2. Aside from making reviews more manageable and focused, merging individual parts earlier also helps to avoid this PR running out of sync.

@rsmmr
Copy link
Member

@rsmmr rsmmr commented Apr 28, 2020

Yeah, let's go with Option 2, and merge the refactoring & static notifications first.

If you can split things out further into more, smaller chunks, that would be worth the effort; both for the refactoring and then for the new functionality. We'll review each to the degree we can, with a particular focus on not breaking existing cluster topologies.

Also, let's remain flexible on timeline: The closer we get to the release, the more risky a merge of complex changes will be. Depending on how things progress, a viable model could be getting the foundational commits in before 2.2, and then the new logic after the release early in the next cycle. Let's see.

@Neverlord Neverlord force-pushed the topic/neverlord/multi-hop-routing branch from 94c4c4b to 7417b33 May 31, 2020
@Neverlord Neverlord force-pushed the topic/neverlord/multi-hop-routing branch from d227221 to bdc21b5 Jun 14, 2020
@Neverlord
Copy link
Member Author

@Neverlord Neverlord commented Jul 10, 2020

While porting the data store actors to the new channels, I've also streamlined the communication between master and clone. So this PR closes #99 and closes #125.

Neverlord added 4 commits Sep 29, 2020
Calling extinguish_one with flare_mtx_ locked causes a deadlock, because
the function then attempts to lock an already locked mutex.
@Neverlord Neverlord force-pushed the topic/neverlord/multi-hop-routing branch from 1daaffc to 2eb63fc Sep 30, 2020
Neverlord added 2 commits Sep 30, 2020
- Retry peerings an infinite amount of times
- Emit a peer_unavailable error on each failed attempt
@Neverlord Neverlord force-pushed the topic/neverlord/multi-hop-routing branch from 0934aa0 to f1916db Sep 30, 2020
@Neverlord Neverlord force-pushed the topic/neverlord/multi-hop-routing branch from c0a914e to bcae433 Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

3 participants
You can’t perform that action at this time.