Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple RPC based service for PyZMQ #174

Closed
wants to merge 11 commits into from
Closed

Simple RPC based service for PyZMQ #174

wants to merge 11 commits into from

Conversation

ellisonbg
Copy link
Contributor

This adds an rpc subpackage to zmq and implements a "simple" RPC service. This version is focused only on the request/reply pattern, but handles load balancing using ROUTER/DEALER sockets. While we definitely want to explore richer RPC APIs that allow for arbitrary socket types and message patterns, this version is very useful because of its extreme simplicity and good performance..

Still working on:

  • Making the serialization approach pluggable. I use pickle now, but will allow json and other approaches.
  • Adding an async client example.

Needs review:

  • Python3 compliance.
  • General API choices.

@minrk
Copy link
Member

minrk commented Feb 17, 2012

nice!

pluggable serialization will be a lot simpler if you just put serialize/deserialize in isolated methods, rather than bundling them in with dispatch.

@ellisonbg
Copy link
Contributor Author

Yes good idea, I will try to get that done tonight.

On Thu, Feb 16, 2012 at 5:05 PM, Min RK
reply@reply.github.com
wrote:

nice!

pluggable serialization will be a lot simpler if you just put serialize/deserialize in isolated methods, rather than bundling them in with dispatch.


Reply to this email directly or view it on GitHub:
#174 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg
Copy link
Contributor Author

OK, I have made the serialization pluggable and added the async client example. I have one design question about the async proxy object's api. The current API looks like this:

echo.echo(callback, *args, **kwargs)

Where the callback is passed the result of the RPC call upon success and a RemoteRPCError instance upon failure. Getting an exception instance is a bit odd, do you think it would be better to pass a pair of callback, errback functions instead?

@minrk
Copy link
Member

minrk commented Feb 17, 2012

There are two choices for serialization:

  1. (what you chose) serialize directly to bytes. This is obviously the easiest and most pluggable for serializing simple small data.
  2. serialize to lists of bytes. This is less pluggable, because it ~always requires subclasses to define their own function, rather than trivially providing a library function.

The first is easiest to write, and easiest to drop in new serialization, but only the second can give you zero-copy, because a zero-copy message invariably turns an object into two or more parts (metadata + buffer).

As for callback/errback, I don't think passing an Exception to the callback makes sense, so you should probably specify an errback. Unfortunately, the signature you provided makes optionally specifying the errback impossible. You could define a single errback at the class level. I'm not sure how best to deal with that.

Something to consider: What happens when a worker never responds to a request?

@ellisonbg
Copy link
Contributor Author

I actually think we could get both 1+2 in the following manner. If the serializer returns a list/tuple, send all element as message frames. Otherwise just send the result. On the deserialization side, if there is one message frame, pass that to the deserializer as is, otherwise pass a list of frames. Do you think that will work?

On the errback: what should the errback get passed? Also, how do you think we should pass the original exception to the RemoteRPCException so the user to figure out what went wrong?

Not sure how to handle failures. Would you integrate heartbeating? What would that API look like on the client side of things?

@minrk
Copy link
Member

minrk commented Feb 19, 2012

I actually think we could get both 1+2 in the following manner. If the serializer returns a list/tuple, send all element as message frames. Otherwise just send the result. On the deserialization side, if there is one message frame, pass that to the deserializer as is, otherwise pass a list of frames. Do you think that will work?

Hm, not really. It would only work if you require that a list-returning serializer never returns a list of length 1. Also, the more black-box approach should probably be passed both args and kwargs, rather than getting them separately, because how are you going to determine how many frames should be passed to the args deserializer, and how many should be passed for kwargs?

On the errback: what should the errback get passed? Also, how do you think we should pass the original exception to the RemoteRPCException so the user to figure out what went wrong?

Probably the same thing we do in IPython - ename, evalue, traceback as strs.

Not sure how to handle failures. Would you integrate heartbeating? What would that API look like on the client side of things?

I would skip heartbeats, and just use timeouts. This would mean requests would need to discard replies with the wrong msg_id, as they would be stale replies from timed out requests.

* Central Serializer class is used, which allows more customization
  of the serialization of args/kwargs/result.
* Errback added to the AsyncRPCServiceProxy. This errback is passed
  the ename, evalue, tb as strings.
* Timeout added to the AsyncRPCServiceProxy.
* self.urls removed.
* Examples use errback and timeout.
* Refactored the Serializer class and created a JSONSerializer
  whose usage is demoed in the examples.
* Fixed minor bugs in simplerrpc.
@ellisonbg
Copy link
Contributor Author

OK I think I have fixed everything. For the serialization I created a Serializer object that supports both full blown customization as well as the simpler approach. I like how that part of it turned out.

I have implemented errback and timeout and added relevant examples.

Only one more question: is there anything I need to do to target Python 3.0 that you can see?

@dcolish
Copy link

dcolish commented Feb 25, 2012

Not sure if there's room for more serializers, especially ones which add more dependencies, but I threw together a symmetric cipher serializer here in an example https://gist.github.com/1911540.

* Finish multi-ident handling.
* ename, evalue, tb is serialized in a better manner.
* Changes from PR #183 incorporated.
* Updated example.
@ellisonbg
Copy link
Contributor Author

@minrk: what do you think about including this serializer? If we include this, it might be nice to improve the API of the base class to handle loads/dumps separately from the encrypt/decrypt. The reason for this is that it would be useful to change the serialization without changing the encryption and vice versa.

@minrk
Copy link
Member

minrk commented Feb 26, 2012

I think it's a useful example, showing how to do subclassing for your own serialization/encryption, so it should go in examples. I don't think encrypt/decrypt should be separate from serialize/deserialize.

I wouldn't want to give anyone the illusion that this code is secure, since it makes no protection against fuzz attacks, or any such thing.

@dcolish
Copy link

dcolish commented Feb 27, 2012

I think the encryption itself is sound save for the trivial key. You're right about the security issues since there's no effort to sanitize user inputs. It would not be a lot more work to use a library like Flatland to ensure user input is sane. I could improve the example to do that if you think its worth it.

@minrk
Copy link
Member

minrk commented Feb 27, 2012

I think the example is good as it is, and helpful for users who want to write derivatives with their own serialization and/or encryption. Simple error handling when unpacking messages should be done in the RPC code itself.

@ellisonbg
Copy link
Contributor Author

OK, can you create a new pull request that puts the serializer example
in examples/web?

On Sun, Feb 26, 2012 at 8:16 PM, Min RK
reply@reply.github.com
wrote:

I think the example is good as it is, and helpful for users who want to write derivatives with their own serialization and/or encryption.  Simple error handling when unpacking messages should be done in the RPC code itself.


Reply to this email directly or view it on GitHub:
#174 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@ellisonbg
Copy link
Contributor Author

@minrk: this one is ready to go as well. I can do the merge if you want.

@minrk
Copy link
Member

minrk commented Feb 27, 2012

I still don't like the fact that you have to specify callback, errback, and timeout every time. That seems pretty annoying and un-pythonic. Can you think of a way to make errback and timeout optional args?

@ellisonbg
Copy link
Contributor Author

If we want to pass _args and *_kwargs, we can't really use keywords
arguments. We could switch to the form:

call(func, callback=None, errback=None, timeout=-1, args=None, kwargs=None)

And force people to pass tuple and dicts for args, kwargs. Which do you prefer?

On Mon, Feb 27, 2012 at 12:00 PM, Min RK
reply@reply.github.com
wrote:

I still don't like the fact that you have to specify callback, errback, and timeout every time.  That seems pretty annoying and un-pythonic.  Can you think of a way to make errback and timeout optional args?


Reply to this email directly or view it on GitHub:
#174 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Copy link
Member

minrk commented Feb 28, 2012

mmh, this is always a decision we have to make - is passing _args, *_kwargs nice enough that it is worth requiring multiple extra arguments that are rarely used be specified every time. We sidestep this in IPython.parallel by setting defaults via attributes. This sort of approach might make sense, at least where errback/timeout are concerned, for which the values will likely be the same for every single call for a given RPC instance.

@dcolish
Copy link

dcolish commented Feb 28, 2012

As one who really wants to use this rpc library, I would not want the apparent interface to expose any inability to just call functions via the proxy. Forcing the users to use args=, kwargs= is pretty rough.

I like the idea of setting these values on a class if you can allow users to pass their own class to use for method wrapping. Its unlikely a user would want these values to change for specific functions.

@ellisonbg
Copy link
Contributor Author

But I don't think the class default solves the issue in this case
because callback and errback will likely change from call to call. I
guess I am probably leaning towards keeping it the same as it is.

On Mon, Feb 27, 2012 at 8:16 PM, Dan Colish
reply@reply.github.com
wrote:

As one who really wants to use this rpc library, I would not want the apparent interface to expose any inability to just call functions via the proxy. Forcing the users to use args=, kwargs= is pretty rough.

I like the idea of setting these values on a class if you can allow users to pass their own class to use for method wrapping. Its unlikely a user would want these values to change for specific functions.


Reply to this email directly or view it on GitHub:
#174 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@minrk
Copy link
Member

minrk commented Feb 28, 2012

Compromise: just pull timeout off to an attribute? It seems extremely unlikely that this would ever change within a given application. I still think errback is likely to be None the majority of the time, and it seems like a poor Python interface that forces you to specify None as the second arg every time you call it. But forcing you to specify it encourages people to think about error handling, which I suppose is a good thing.

We have already discovered that there's no such thing as a really good interface that accepts **kwargs and supports optional behavior.

@ellisonbg
Copy link
Contributor Author

OK license is updated.

@darkdarkfruit
Copy link

Why not merge this rpc part? I find it very useful.

@minrk
Copy link
Member

minrk commented Aug 17, 2012

The RPC code is incomplete and needs work, particularly on handling dying workers. I don't think it should go in before at least the interface is relatively settled.

@ellisonbg
Copy link
Contributor Author

My memory is a bit fuzzy about this now. IIRC though, this stalled out
because we weren't sure where to put it, not because the code was
incomplete. I think the code is in a pretty good state actually. My
problem with it sitting where it is unmerged is that it violates the "merge
early, merge often" mantra that gets code into user's hands quickly and
allows development to iterate and proceed. I don't think it will be
improved by just sitting (it hasn't happened so far). I see to options:

  1. I can run a few more tests and look it over and then we merge it. In
    this scenario, we would also have to put them in ipython until the required
    version of pyzmq is common. We would also have to do this to hedge against
    API changes.
  2. We take this and the zmq.web and put them in separate projects. The
    design and purpose of the rpc stuff and zmq.web is so similar that I think
    they should be handled in the same way. This would allow separate release
    and maintenance cycles - there are pros and cons to that. We will need
    these packages for IPython in the long run and having them separate forces
    additional dependencies.

Neither option is thrilling.

@minrk, @takluyver, @fperez, Which do you prefer?

I do think that the core of pyzmq is quite stable and people are using it
in production. We have to approach changes very conservatively. But, that
shouldn't prevent us from adding new code that is less stable, that doesn't
affect the stability of the core.

On Fri, Aug 17, 2012 at 10:01 AM, Min RK notifications@github.com wrote:

The RPC code is incomplete and needs work, particularly on handling dying
workers. I don't think it should go in before at least the interface is
relatively settled.


Reply to this email directly or view it on GitHubhttps://github.com//pull/174#issuecomment-7828701.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

@takluyver
Copy link
Contributor

Is the plan to later expand this to cover more complex RPC uses, or is it just intended to be a simple base that third parties can expand?

In the latter case, I'd side with including it in pyzmq. In the former case, I think a separate package would make more sense, as a matter of perception: there are lots of use cases for ZMQ besides RPC, and you probably don't want people to think pyzmq is turning into an RPC package.

@minrk
Copy link
Member

minrk commented Aug 17, 2012

We had decided that development should happen in code where it is used (IPython multiuser) rather than in isolation. Then, when it is demonstrated satisfactory and stable for a real application, it can be pulled upstream.

Cutting a release of pyzmq with RPC adds enormous inertia to the APIs, which I am not at all convinced are final (especially given that the API discussion here so far remains totally inconclusive).

@ellisonbg
Copy link
Contributor Author

I have decided to maintain this as a separate project:

https://github.com/ellisonbg/zpyrpc

@ellisonbg ellisonbg closed this Dec 30, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants