Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Made code (mostly) runtime and transport independent #77

Merged
merged 2 commits into from
Oct 13, 2020
Merged

Made code (mostly) runtime and transport independent #77

merged 2 commits into from
Oct 13, 2020

Conversation

TheButlah
Copy link
Contributor

@TheButlah TheButlah commented Oct 10, 2020

In order to get the code to work in a way that is agnostic to a particular transport, as well as pave the way towards supporting both tokio and async-std, we need to refactor the code to stop using TcpStreams and tokio_util::codec::Framed<TcpStream, ZmqCodec> types where possible. The main contribution of this PR is as follows:

  1. Introduce the FramedIo type, which is analagous to tokio_util::codec::Framed, except it is async-runtime independent, and also independent across transport types.
  2. Reorganize some of the tokio-specific functionality into isolated modules and try to expose runtime-agnostic functions that will eventually let us feature gate and/or conditionally compile the particular implementation to use. Almost no calls to tokio are outside of this module - the only things left to pull out are tokio::spawn() calls and to convert one tokio::select! into futures::select!, both of which are trivial and can be done in a separate PR
  3. Redesign and rename util::start_accepting_connections() to transport::tcp::begin_accept(). It now provides a more general callback that doesn't rely on any particular transport, runtime, or socket-specific data (for example the PubSocket.backend can be captured in the closure as opposed to passed along through the begin_accept() function).

@TheButlah TheButlah changed the title Added FramedIO, runtime-specific transport code, and refactored Added FramedIO, isolated runtime-specific code, and refactored Oct 10, 2020
@TheButlah TheButlah changed the title Added FramedIO, isolated runtime-specific code, and refactored Added FramedIo, isolated runtime-specific code, and refactored Oct 10, 2020
@TheButlah TheButlah changed the title Added FramedIo, isolated runtime-specific code, and refactored Made code (mostly) runtime and transport independent Oct 11, 2020
@Alexei-Kornienko
Copy link
Collaborator

Overall this looks very promising. However I have some performance related concerns due to change of codec type.

It would be good if we could implement several simple performance tests (#44) and try to measure socket throughput before and after this change. I think we can implement some kind of tests based on https://github.com/bheisler/criterion.rs And it would help us to track how our changes affect overall library performance.

@TheButlah could you please take a look at this tests? I planned to start working on them but for now I don't have a lot of free time unfortunately :( (I hope that I'll have more time available in November)

@TheButlah
Copy link
Contributor Author

TheButlah commented Oct 11, 2020

I absolutely agree about the performance tests, especially if we want to be on par with the performance of the C++ version. However, do we really need them now as opposed to when the codebase is more mature (or at least after I've implemented some of the other features that this enables :P)?

The PR introduces two possible spots for minor slowdown - the dynamic dispatch that will happen for the boxed Frameable trait object inside FramedIo, and the conversion from tokio's async read/write to futures async read/write via the tokio_util::compat::Compat wrapper.

The dynamic dispatch can be solved by making the inner type a generic argument and bubbling up the generic args through the functions that use it. This would make the type signatures more complicated but is definitely possible. Another probably better alternative is to use "enum dispatch", which can be done whenever the number of potential implementations of the trait is known in advance. In this case we would have a fixed set of implementations, one for each stream type (async-std, tokio)*(tcp, wss, ipc,..) in an enum, and instead of boxing the trait object and doing dynamic dispatch, we have an enum of the correct type. This can be done easily with the https://docs.rs/enum_dispatch/0.3.3/enum_dispatch/ crate, and is supposed to be a lot faster than dynamic dispatch. With both of these approaches in our toolbox, I'm very confident that if we are concerned about the performance cost of dynamic dispatch, we could submit a later PR to bring the performance inline with the static dispatch approach.

As for the tokio_util::compat::Compat wrapper, looking at the implementation its basically just the newtype pattern, helped along with some macros to avoid boilerplate. It should in theory be identical in performance to the unwrapped type, since it uses generics for the inner type.

TLDR I agree about the need for perf benchmarks, but I'm less enthusiastic about implementing them before working on some of the other stuff, mostly due to lack of experience writing perf benchmarks. Since the only theoretical performance regression is boxed dynamic dispatch, and since we can fix that in the future via either bubbling the generics up or using enum dispatch, could we tackle that issue later?

@TheButlah
Copy link
Contributor Author

TheButlah commented Oct 11, 2020

In fact if you'd like, I could implement the enum dispatch approach now- that library I linked really does make it quite simple. Would that be satisfactory?

@Alexei-Kornienko
Copy link
Collaborator

The main issue I see is not just dynamic dispatch but rather 2 different implementations of Codec:
tokio_codec vs futures_codec. This is one of the most important parts that makes transport fast and efficient.

If you don't want to work on tests I can surely do it (cause I have it already in plans). In such case can we keep this MR in a separate branch without merging for couple of weeks? I'll try to add some benchmarks and compare these 2 branches as soon as I will have some time to do actual coding..

@TheButlah
Copy link
Contributor Author

Ah I see, I didn't realize the concern was tokio_codec vs futures_codec.

I'm reading a bit more right now on criterion, and if the concern is just testing the difference between those two codec implementations, thats probably something easier to test. Lemme see what I can throw together in a separare MR that only switched from tokio_codec to futures_codec. I'll add a basic benchmark there. Once we can be confident that that isn't a source of perf regression, would we then be able to merge this as is?

@TheButlah
Copy link
Contributor Author

TheButlah commented Oct 11, 2020

Quick update: I've added #78 as a more limited version of this PR that only switched to futures_codec, without getting any of the runtime or transport agnostic stuff. I'll work on a basic performance benchmark for the codecs, submit a PR for that, and once thats merged we can rebase both #78 and the current MR to see what the perf implications are.

However, in order to benchmark our stuff with criterion, we can only benchmark public functionality and the codec stuff is private to the crate. @Alexei-Kornienko Would it be a good idea to break the codec stuff out into a separate crate and use cargo workspaces? After all, someone could theoretically want to use your codec implementation in other rust zmq crates

@Alexei-Kornienko
Copy link
Collaborator

Alexei-Kornienko commented Oct 11, 2020

@TheButlah I don't think that we need to benchmark codec directly. Instead I think we can write a generic benchmark. For example it could be very similar to our integration tests:

  1. bind a server
  2. create 1-10 clients and try send/recv messages
  3. measure time taken for each message processing
  4. measure overall time needed to process lets say 100K messages

Run this test on current master branch. Run the same test for your branch. If results are +- the same then there is no visual degradation in performance.

@TheButlah
Copy link
Contributor Author

Related: #81

@TheButlah
Copy link
Contributor Author

OK - I've rebased on the recently merged PRs, and the only thing left to do is to undergo review (make sure I didn't mess up anything in the rebase) as well as run our new benchmark to see if the dynamic dispatch causes a measurable performance regression.

@TheButlah
Copy link
Contributor Author

TheButlah commented Oct 13, 2020

❯ cargo bench -- --load-baseline runtime-agnostic --baseline master
    Finished bench [optimized] target(s) in 0.05s
     Running target/release/deps/req_rep-f8f5c926edd0add0
Gnuplot not found, using plotters backend
Bound rep socket to tcp://localhost:33241
1 req and 1 rep messaging                        
                        time:   [14.679 ms 14.712 ms 14.746 ms]
                        change: [-2.6787% -2.3447% -2.0213%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 512 measurements (4.69%)
  15 (2.93%) high mild
  9 (1.76%) high severe

Looks like this MR actually speeds things up - strange, given the dynamic dispatch. But I'll take it. @Alexei-Kornienko are we good to merge?

This was referenced Oct 13, 2020
@Alexei-Kornienko
Copy link
Collaborator

@TheButlah I rechecked again locally. I don't see any changes in performance talking 3 runs on current master and 3 runs on your branch. Overall LGTM

@Alexei-Kornienko Alexei-Kornienko merged commit dce2c13 into zeromq:master Oct 13, 2020
@TheButlah TheButlah deleted the runtime-agnostic-codecs branch October 13, 2020 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants