Made code (mostly) runtime and transport independent #77

TheButlah · 2020-10-10T22:44:10Z

In order to get the code to work in a way that is agnostic to a particular transport, as well as pave the way towards supporting both tokio and async-std, we need to refactor the code to stop using TcpStreams and tokio_util::codec::Framed<TcpStream, ZmqCodec> types where possible. The main contribution of this PR is as follows:

Introduce the FramedIo type, which is analagous to tokio_util::codec::Framed, except it is async-runtime independent, and also independent across transport types.
Reorganize some of the tokio-specific functionality into isolated modules and try to expose runtime-agnostic functions that will eventually let us feature gate and/or conditionally compile the particular implementation to use. Almost no calls to tokio are outside of this module - the only things left to pull out are tokio::spawn() calls and to convert one tokio::select! into futures::select!, both of which are trivial and can be done in a separate PR
Redesign and rename util::start_accepting_connections() to transport::tcp::begin_accept(). It now provides a more general callback that doesn't rely on any particular transport, runtime, or socket-specific data (for example the PubSocket.backend can be captured in the closure as opposed to passed along through the begin_accept() function).

Alexei-Kornienko · 2020-10-11T10:20:08Z

Overall this looks very promising. However I have some performance related concerns due to change of codec type.

It would be good if we could implement several simple performance tests (#44) and try to measure socket throughput before and after this change. I think we can implement some kind of tests based on https://github.com/bheisler/criterion.rs And it would help us to track how our changes affect overall library performance.

@TheButlah could you please take a look at this tests? I planned to start working on them but for now I don't have a lot of free time unfortunately :( (I hope that I'll have more time available in November)

TheButlah · 2020-10-11T15:21:48Z

I absolutely agree about the performance tests, especially if we want to be on par with the performance of the C++ version. However, do we really need them now as opposed to when the codebase is more mature (or at least after I've implemented some of the other features that this enables :P)?

The PR introduces two possible spots for minor slowdown - the dynamic dispatch that will happen for the boxed Frameable trait object inside FramedIo, and the conversion from tokio's async read/write to futures async read/write via the tokio_util::compat::Compat wrapper.

The dynamic dispatch can be solved by making the inner type a generic argument and bubbling up the generic args through the functions that use it. This would make the type signatures more complicated but is definitely possible. Another probably better alternative is to use "enum dispatch", which can be done whenever the number of potential implementations of the trait is known in advance. In this case we would have a fixed set of implementations, one for each stream type (async-std, tokio)*(tcp, wss, ipc,..) in an enum, and instead of boxing the trait object and doing dynamic dispatch, we have an enum of the correct type. This can be done easily with the https://docs.rs/enum_dispatch/0.3.3/enum_dispatch/ crate, and is supposed to be a lot faster than dynamic dispatch. With both of these approaches in our toolbox, I'm very confident that if we are concerned about the performance cost of dynamic dispatch, we could submit a later PR to bring the performance inline with the static dispatch approach.

As for the tokio_util::compat::Compat wrapper, looking at the implementation its basically just the newtype pattern, helped along with some macros to avoid boilerplate. It should in theory be identical in performance to the unwrapped type, since it uses generics for the inner type.

TLDR I agree about the need for perf benchmarks, but I'm less enthusiastic about implementing them before working on some of the other stuff, mostly due to lack of experience writing perf benchmarks. Since the only theoretical performance regression is boxed dynamic dispatch, and since we can fix that in the future via either bubbling the generics up or using enum dispatch, could we tackle that issue later?

TheButlah · 2020-10-11T15:35:58Z

In fact if you'd like, I could implement the enum dispatch approach now- that library I linked really does make it quite simple. Would that be satisfactory?

Alexei-Kornienko · 2020-10-11T15:50:52Z

The main issue I see is not just dynamic dispatch but rather 2 different implementations of Codec:
tokio_codec vs futures_codec. This is one of the most important parts that makes transport fast and efficient.

If you don't want to work on tests I can surely do it (cause I have it already in plans). In such case can we keep this MR in a separate branch without merging for couple of weeks? I'll try to add some benchmarks and compare these 2 branches as soon as I will have some time to do actual coding..

TheButlah · 2020-10-11T16:02:21Z

Ah I see, I didn't realize the concern was tokio_codec vs futures_codec.

I'm reading a bit more right now on criterion, and if the concern is just testing the difference between those two codec implementations, thats probably something easier to test. Lemme see what I can throw together in a separare MR that only switched from tokio_codec to futures_codec. I'll add a basic benchmark there. Once we can be confident that that isn't a source of perf regression, would we then be able to merge this as is?

TheButlah · 2020-10-11T16:28:37Z

Quick update: I've added #78 as a more limited version of this PR that only switched to futures_codec, without getting any of the runtime or transport agnostic stuff. I'll work on a basic performance benchmark for the codecs, submit a PR for that, and once thats merged we can rebase both #78 and the current MR to see what the perf implications are.

However, in order to benchmark our stuff with criterion, we can only benchmark public functionality and the codec stuff is private to the crate. @Alexei-Kornienko Would it be a good idea to break the codec stuff out into a separate crate and use cargo workspaces? After all, someone could theoretically want to use your codec implementation in other rust zmq crates

Alexei-Kornienko · 2020-10-11T16:39:29Z

@TheButlah I don't think that we need to benchmark codec directly. Instead I think we can write a generic benchmark. For example it could be very similar to our integration tests:

bind a server
create 1-10 clients and try send/recv messages
measure time taken for each message processing
measure overall time needed to process lets say 100K messages

Run this test on current master branch. Run the same test for your branch. If results are +- the same then there is no visual degradation in performance.

TheButlah · 2020-10-12T15:12:50Z

Related: #81

TheButlah · 2020-10-12T22:54:57Z

OK - I've rebased on the recently merged PRs, and the only thing left to do is to undergo review (make sure I didn't mess up anything in the rebase) as well as run our new benchmark to see if the dynamic dispatch causes a measurable performance regression.

TheButlah · 2020-10-13T00:52:41Z

❯ cargo bench -- --load-baseline runtime-agnostic --baseline master
    Finished bench [optimized] target(s) in 0.05s
     Running target/release/deps/req_rep-f8f5c926edd0add0
Gnuplot not found, using plotters backend
Bound rep socket to tcp://localhost:33241
1 req and 1 rep messaging                        
                        time:   [14.679 ms 14.712 ms 14.746 ms]
                        change: [-2.6787% -2.3447% -2.0213%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 512 measurements (4.69%)
  15 (2.93%) high mild
  9 (1.76%) high severe

Looks like this MR actually speeds things up - strange, given the dynamic dispatch. But I'll take it. @Alexei-Kornienko are we good to merge?

Alexei-Kornienko · 2020-10-13T21:22:01Z

@TheButlah I rechecked again locally. I don't see any changes in performance talking 3 runs on current master and 3 runs on your branch. Overall LGTM

TheButlah changed the title ~~Added FramedIO, runtime-specific transport code, and refactored~~ Added FramedIO, isolated runtime-specific code, and refactored Oct 10, 2020

TheButlah changed the title ~~Added FramedIO, isolated runtime-specific code, and refactored~~ Added FramedIo, isolated runtime-specific code, and refactored Oct 10, 2020

This was referenced Oct 10, 2020

[Tracking] Support transport types other than tcp #65

Open

Support WASM for native zmq in the web browser #64

Open

TheButlah changed the title ~~Added FramedIo, isolated runtime-specific code, and refactored~~ Made code (mostly) runtime and transport independent Oct 11, 2020

TheButlah mentioned this pull request Oct 11, 2020

Ported code to use futures_codec and tokio_util's Compat #78

Merged

TheButlah mentioned this pull request Oct 12, 2020

Implement basic performance benchmark #80

Closed

This was referenced Oct 13, 2020

Further generalize across transports #83

Merged

IPC Implementation! #84

Merged

TheButlah added 2 commits October 13, 2020 17:19

Added FramedIO, runtime-specific transport code, and refactored

060bd20

Moved compat mod to codec::framed

437aa36

Alexei-Kornienko merged commit dce2c13 into zeromq:master Oct 13, 2020

TheButlah deleted the runtime-agnostic-codecs branch October 13, 2020 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Made code (mostly) runtime and transport independent #77

Made code (mostly) runtime and transport independent #77

TheButlah commented Oct 10, 2020 •

edited

Loading

Alexei-Kornienko commented Oct 11, 2020

TheButlah commented Oct 11, 2020 •

edited

Loading

TheButlah commented Oct 11, 2020 •

edited

Loading

Alexei-Kornienko commented Oct 11, 2020

TheButlah commented Oct 11, 2020

TheButlah commented Oct 11, 2020 •

edited

Loading

Alexei-Kornienko commented Oct 11, 2020 •

edited

Loading

TheButlah commented Oct 12, 2020

TheButlah commented Oct 12, 2020

TheButlah commented Oct 13, 2020 •

edited

Loading

Alexei-Kornienko commented Oct 13, 2020

Made code (mostly) runtime and transport independent #77

Made code (mostly) runtime and transport independent #77

Conversation

TheButlah commented Oct 10, 2020 • edited Loading

Alexei-Kornienko commented Oct 11, 2020

TheButlah commented Oct 11, 2020 • edited Loading

TheButlah commented Oct 11, 2020 • edited Loading

Alexei-Kornienko commented Oct 11, 2020

TheButlah commented Oct 11, 2020

TheButlah commented Oct 11, 2020 • edited Loading

Alexei-Kornienko commented Oct 11, 2020 • edited Loading

TheButlah commented Oct 12, 2020

TheButlah commented Oct 12, 2020

TheButlah commented Oct 13, 2020 • edited Loading

Alexei-Kornienko commented Oct 13, 2020

TheButlah commented Oct 10, 2020 •

edited

Loading

TheButlah commented Oct 11, 2020 •

edited

Loading

TheButlah commented Oct 11, 2020 •

edited

Loading

TheButlah commented Oct 11, 2020 •

edited

Loading

Alexei-Kornienko commented Oct 11, 2020 •

edited

Loading

TheButlah commented Oct 13, 2020 •

edited

Loading