Skip to content
This repository has been archived by the owner on Mar 11, 2021. It is now read-only.

[WIP] Separate server from protocol and application #95

Closed
wants to merge 1 commit into from

Conversation

gterzian
Copy link

@gterzian gterzian commented Nov 1, 2019

Example of design proposed at #61 (comment)

  • Targeted PR against correct branch (see CONTRIBUTING.md)

  • Linked to github-issue with discussion and accepted design OR link to spec that describes this work.

  • Wrote tests

  • Updated relevant documentation (docs/)

  • Added a relevant changelog entry: clog add [section] [stanza] [message]

  • Reviewed Files changed in the github PR explorer

  • Bump Tendermint & ABCI version in README.md if version was updated


For Admin Use:

  • Added appropriate labels to PR (ex. wip, ready-for-review, docs)
  • Reviewers Assigned
  • Squashed all commits, uses message "Merge pull request #XYZ: [title]" (coding standards)

@gterzian gterzian mentioned this pull request Nov 1, 2019
@gterzian gterzian force-pushed the test_reorg branch 6 times, most recently from e437068 to f3ffeab Compare November 1, 2019 08:06
@tomtau tomtau requested review from tac0turtle, tomtau, liamsi and melekes and removed request for tac0turtle and tomtau November 1, 2019 08:52
@gterzian gterzian force-pushed the test_reorg branch 16 times, most recently from f462b40 to 2848dcc Compare November 1, 2019 13:28
@gterzian gterzian force-pushed the test_reorg branch 5 times, most recently from 5750300 to fe72b65 Compare November 3, 2019 16:49
@gterzian
Copy link
Author

gterzian commented Nov 7, 2019

Ok the "protocol" part is ready for review, there is a unit-test for that. I haven't run the actual server yet, or otherwise tested it, so that part would still require additional work.

Comment on lines +181 to +200
fn check_response(&self, response: &Response) -> Result<(), ()> {
let has_expected = match self.response_type {
ResponseType::Info => response.has_info(),
ResponseType::SetOption => response.has_set_option(),
ResponseType::Query => response.has_query(),
ResponseType::CheckTx => response.has_check_tx(),
ResponseType::InitChain => response.has_init_chain(),
ResponseType::BeginBlock => response.has_begin_block(),
ResponseType::DeliverTx => response.has_deliver_tx(),
ResponseType::EndBlock => response.has_end_block(),
ResponseType::Commit => response.has_commit(),
ResponseType::Flush => response.has_flush(),
ResponseType::Echo => response.has_echo(),
ResponseType::Exception => response.has_exception(),
};
if !has_expected {
return Err(());
}
Ok(())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking whether this dynamic checking could be turned more into static checking (at least enforce it for the client / implementor) -- e.g. if there's Responder<T: ResponseType> ...

and the methods in Application would then require to be instantiated with the correct type, something like
fn deliver_tx(&mut self, _p: RequestDeliverTx, responder: Responder<ResponseDeliverTx>) {....

Something like this pattern: https://yoric.github.io/post/rust-typestate/

@gterzian any thoughts?

Copy link
Author

@gterzian gterzian Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an interesting idea, and I think it would require making a bunch of other stuff generic as well, for example

  1. I believe that if the Application starts having generic methods, you can't Box it anymore. There are ways around that, by making Protocol itself generic.
  2. Unless we make Response itself generic, the check is going to have to be at runtime I think. And if we make Response generic, it will require making all sorts of other things generic as well.

So instead of making the current Protocol generic,
I think one might want to try to spawn one Protocol, or rather Session per "session", and type the whole thing based on the back and forth messaging you expect using for example those session types.

You would probably still need one "master" Protocol, running like the current one, to handle other stuff (like query/echo and other potential "out of protocol" requests?) and to spawn typed "child" Session<T> or something similar.

It might also require re-thinking the Application trait, and perhaps supplementing it with some AppSession<T> or similar, that would represent the handling of a single block and fully-type the protocol.

Probably best done in a potential follow-up.

Copy link
Contributor

@tomtau tomtau Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current proposal looks good -- my main concern in this proposal is that it's a little bit less type-safe than the previous one -- e.g. as a client / implementor, I may by mistake pass to deliver_tx a Responder that returns ResponseCheckTx (instead of ResponseDeliverTx). Ideally, this would be caught at compile-time rather than runtime.

Copy link
Author

@gterzian gterzian Nov 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if I understand what you mean, you're concern is that the variant of request/response is a runtime thing, not compile-time.

In the current version, each method of Application has a specific Request_* as argument and returns a specific Response_*.

This version has a single Responder, that then does some sanity check at runtime using an enum of response types.

Loosing some "type-safety" will happen each time you use an enum as type. You can catch errors at compile-time, at the cost of of having to use a struct for each variant, versus a single enum.

I could do the same by introducing a ResponderCheckTx, ResponderDeliverTx, Responder_*, and having the protocol select on a dozen channel, one for each "type".

That might be even better than using a generic approach, however you can't say it's pretty or concise code. It probably leaves plenty of room for logical errors anyway(the more insidious ones probably).

So I think it might be a bad trade-off, you get fairly minimal "type-safety" preventing one class of programming errors at compile vs runtime, and in return you get convoluted code.

FInally, in terms of programming error. Code that compile doesn't necessarily run as you'd expect. You can still make a mistake and define a method as using DeliverTx where CheckTx belongs, no one might notice, and the app author just follows the compiler. That wouldn't ensure correct runtime behavior either.

let res = ResponseDeliverTx::new();
let mut response = Response::new();
response.set_deliver_tx(res);
let _ = responder.respond(response);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other thought -- why throw away Result<(), ()> here and not return them?
if it's returned, https://github.com/tendermint/rust-abci/pull/95/files#diff-b4aea3e418ccdb71239b96952d9cddb6R329 could crash immediately (so there are no silent errors)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean make it the return value of the trait method? In that case it would have to be a sync call, while now the app can pass the responder to other threads and eventually call respond on it.

Copy link
Contributor

@snowsignal snowsignal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some general feedback:

  • Especially when you factor in the threading and futures handling, the code can be hard to follow. I spent a good 20-30 minutes trying to figure out what calls what, and the opaqueness of the public API may make it hard for developers to figure out internal errors, or how the framework is actually executing and processing their code. It would be nice to put a little text diagram in the comments, or somewhere, that would show how Server, Protocol, and Application send messages back and forth between each-other. Also, my experience with Tokio is limited, but shouldn't Protocol act more like a future that is executed on the Tokio runtime, instead of being spawned on a system thread?
  • Why does the counter example need a separate CounterProxy? Couldn't you just implement Application directly on CounterApp? Why do you need the added overhead of channels? CounterApp is repeatedly polling a channel on a separate thread, but since that's all it's doing, it doesn't make sense for it to be running separately. Just execute the functions directly.

Besides that, I just have problems with the concept. In my mind, Application represents the state machine that the application is simulating. In other words, it should be updating or checking the state, and then returning a response. To me, it doesn't make a huge amount of sense why the public API should be concerned with sending this data over a oneshot channel. It would make more sense for it just to return the response and have the handler for the Application, which is Protocol, send that over a channel. As a matter of fact, why use channels at all? Protocol is just a message passer, an unneeded layer of indirection between the server and application. Not to say that I'm against everything in this PR: the start method could be potentially useful for an application, if it passed resources from the server. But this PR seems to be adding complication for the sake of having information being passed over channels. I don't see the need for all these channels to be in use, especially when we're only running one Application at a time, maximum.

TL;DR: This PR, in essence, is too complicated.

Comment on lines +44 to +63
impl CounterProxy {
fn new() -> Self {
let (sender, receiver) = unbounded();
CounterProxy {
sender,
receiver: Some(receiver),
exiter: None,
}
}
}

impl CounterProxy {
fn exit_abci(&self) {
let _ = self
.exiter
.as_ref()
.expect("Proxy to have an exiter")
.exit();
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably could be combined under a single impl block.

Comment on lines +63 to +66
let mut response = Response::new();
let res = ResponseInfo::new();
response.set_info(res);
let _ = responder.respond(response);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a lot of boilerplate code! ResponseInfo::new() is much more readable then this 4-line sequence that only produces the default response. Also, info returns (), but I think it would make more sense to return a Result of some kind, so any errors could be propagated. Right now, Protocol::handle_network_request makes the assumption that no internal errors occur when processing a request, but that isn't necessarily true. Either we should pass errors through the respective response methods, Protocol::handle_network_request and Protocol::run, or Responder::respond shouldn't return Result<(), ()>.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A single call to ResponseInfo::new() is indeed simpler, and it's also semantically different so I don't think a comparison is in order.

As to error handling, it might make sense for the application to send some errors to the protocol via an additional method on Responder. Alternatively, it could also use to Exiter to stop the protocol and server, if there is no way to recover from the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the semantics are different, I was just pointing out how this change makes things a lot more verbose.
It might be better if the responder wrapped the specific response type in a Response before sending it through the channel, so we could do this:

 responder.respond_info(ResponseInfo::new())

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be another way to achieve it, and there you'd end-up with a lot more methods on the Responder.

You could perhaps solve that problem by introducing a AppResponse enum, pass that to respond, which would match on it and internally create the response and call the appropriate set_ method on it with the inner value of the enum variant.

@gterzian
Copy link
Author

the public API may make it hard for developers to figure out internal errors, or how the framework is actually executing and processing their code

The public API is not meant to reflect the internals of the code.

It would be nice to put a little text diagram in the comments

That's a good idea, for some internal documentation unrelated to the API.

shouldn't Protocol act more like a future that is executed on the Tokio runtime, instead of being spawned on a system thread?

It could, and the point to running it on a thread is to isolate the server, which is running on it's own tokio runtime, from calling into the application, as well as not enforcing an async style across the board. Protocol being a thread, which then calls into the application, makes it much harder for the application to block the server, even if the application doesn't use any threading.

Why does the counter example need a separate CounterProxy?

It's meant as an example of how one could send the responder across threads and respond in an "asynchronous" way.

To me, it doesn't make a huge amount of sense why the public API should be concerned with sending this data over a oneshot channel.

The public API is not concerned with sending data over a oneshot channel. It calls responder.respond with the response.

It would make more sense for it just to return the response and have the handler for the Application, which is Protocol, send that over a channel.

The point is that with the current setup, using the Responder, the application can generate the response and call response.responder at some later time and from any thread it wants, including some using sort of threading, futures, or just right then and there.

if the application has to return a response, it makes the API fundamentally sync(although I guess an application could while blocking the protocol, still spawn some background work).

But this PR seems to be adding complication for the sake of having information being passed over channels. I don't see the need for all these channels to be in use, especially when we're only running one Application at a time, maximum.

The point is instead to introduce a layer between the server and the application, which could enforce certain invariant about the request/response flow, while running independently of the server, and optionally also from the application. An example for such a use-case is found at #49

This PR, in essence, is too complicated.

The goal of this PR is to provide a proof-of-concept for an API that:

  1. is optionally asynchronous.
  2. is not futures based(since that would remove the optional part from 1).
  3. allows for some infrastructure to do additional sanity checks for the protocol flow.
  4. provides some sort of isolation between the application and the server.

If you know of a way to achieve these goals in a simpler way, I would be happy to read about it or see an alternative proof-of-concept.

@snowsignal
Copy link
Contributor

The public API is not meant to reflect the internals of the code.

What I meant to say is that there's a lot going on under the hood that can't be configured, such as how Protocol handles messages.

It could, and the point to running it on a thread is to isolate the server, which is running on it's own tokio runtime, from calling into the application, as well as not enforcing an async style across the board. Protocol being a thread, which then calls into the application, makes it much harder for the application to block the server, even if the application doesn't use any threading.

That makes a lot of sense. Putting the application logic on a seperate thread would keep the server from blocking, and that's a really good idea. But we don't need a message-passing struct to handle this: it would be simpler if we just had the server communicate directly with the Application over channels. Here's some psuedocode for how we could simplify this:

// This is spawned on a separate thread by the server
fn process_server_requests(server_request_channel, application) {
  loop {
    while let request = server_request_channel.take() {
      match request {
        (info, response_channel) => application.info(request, Responder::new(response_channel.clone())))
        // etc...
      }
    }
    // handle exiting, etc.
  }
}

That way, there isn't any intermediate polling of responses: they get sent right back to the Connection. Sanity checking could be done there.

The goal of this PR is to provide a proof-of-concept for an API that:

  1. is optionally asynchronous.
  2. is not futures based(since that would remove the optional part from 1).
  3. allows for some infrastructure to do additional sanity checks for the protocol flow.
  4. provides some sort of isolation between the application and the server.

I'm sorry for misunderstanding earlier the intents of this PR. However, I have some comments for each point:

  1. Since Application is a state machine, what element of it would be asynchronous? Because of determinism, the functions shouldn't be doing any external I/O. I guess async I/O might make sense if the Application needed to save something externally, such as during commit. The problem is, making something like deliver_tx asynchronous could result in unexpected problems. Things could potentially be written to the state out of order, which would be a massive application bug. An application developer may not be aware of this risk, and ideally the API should have as few surprises as possible. If we give the application developer the option to asynchronously return a result for a function that should never be doing any asynchronous operation, we are telling them that it's OK to do that, when it's not. So we might want to re-think what functions should be allowed asynchronicity.
  2. We can still have synchronous operation with futures. If the future doesn't call any internal futures, spawning it on the Tokio runtime will just run it and return. In addition, we could run application functions as a task, which would go nicely with the Tokio-based server. And with Rust's support for async/await, futures might be the more idiomatic way to do this.
  3. If we want sanity checking, it may make sense for us to pass in a function, closure, or trait object to do some custom error checking, to allow an application to have more control over invalid responses or internal errors.
  4. If we want to separate these two via threading, to summarize, we should either just put Application in a separate thread and communicate directly with Server channels, or we should just use futures.

@gterzian
Copy link
Author

gterzian commented Nov 26, 2019

In response to the numbered comments:

  1. I think it's important to differentiate between "asynchrony" and "ordering":

    • asynchrony: In the issue tracker it was discussed that the application might be implemented through either threading or futures, and the question was how to integrate the ABCI layer with such a system. So in this example, the Responder is something that can be sent to other threads(or tasks), and eventually used to provide a response, asynchronously.
    • ordering: one of the goal of the protocol layer is to offer some guarantees about the protocol workflow, which could also include ordering of requests/responses and other operations. This could be done while allowing a given response to be generated asynchronously. The protocol, as it sits between the application and the server, sees every message coming through, and can therefore ensure the right ordering of operations by tracking what it has seen, and making assertions about what it expects to see, as well as not forwarding(instead buffering) certain messages until some operation has completed.

You mention handling of deliver_tx. So the protocol could call app.deliver_tx, and then not allow anything else but a "deliver response" to be forwarded to the server. It could also not forward any further request to the app, until the "deliver response" has been received. This doesn't mean the app.deliver_tx has to block. The whole thing could keep running, and certain messages be buffered, or some other strategy adopted(including blocking if really necessary).

So, while an application might ultimately have to behave as a deterministic state machine, it could still consists of different components running in parallel, so the Responder is useful in such a situation since it enables the application to send it from one component to another, and eventually provide a response. This can still be a deterministic process. Two threads communicating via message-passing can have deterministic behavior if the channel ensures ordering of messages(we're not talking actors with unordered mailboxes or something similar).

The unit test found here provides a good example of the ordering guarantees that can be provided, and tested, all the while retaining an asynchronous execution model involving parallel computation(in the case the main test thread and the protocol).

  1. Yes I can believe all sorts of things could be achieved with futures, the purpose of this PR is showing how you can do it without. I personally think futures are a good fit for the server, and when it comes to running stuff in a specific order it's easier to use threads and block on channels(or condvars).

  2. I think it would be easier to have a central protocol struct keeping all the necessary state inside a single thread, seeing all messages come through, and making sanity checks along the way. And some sort of closure based approach might work too, probably while requiring some form of shared-state(and you might really have to block what would be an unknown current-thread in which the closure would execute to ensure some ordering of operations, so it could get quite complicated to get right and almost impossible to guarantee against deadlocks).
    Actually, the Responder does it's own limited type of checks "remotely" when it is used to send a response back(here).

  3. To answer this in the light of the example pseudo-code you provided, yes I guess you could not use a struct, and for the sanity checks you're going to have to store some state somewhere, so it might as well be on a Protocol struct. The server and the application in your example communicate through some loop running in a thread, and this appears relatively similar to running while protocol.run() in a thread.

@snowsignal
Copy link
Contributor

Alright, I see your rationale for this. While I am personally in favor of a future-based approach, I can see how this could be a worthy alternative.
However, I still think that we could improve the API a bit. As I mentioned earlier, responder.respond_info(ResponseInfo::new()) would be much more concise and readable. It would also let us do compile-time checking that the correct type is being passed. We could still put the ResponseInfo (and other structs generated by protobuf) in a Response when it goes over the channel, but this can be purely internal.

I also think it would be nice if there was some external interface for the Server, so (for example) an application could move the server to another thread while it does something else and sends an exit signal so the server and it's corresponding threads can shut down.

@gterzian
Copy link
Author

Closing this one, I think the point of the example was made...

@gterzian gterzian closed this Feb 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants