Simple RPC based service for PyZMQ #174

ellisonbg · 2012-02-17T00:49:45Z

This adds an rpc subpackage to zmq and implements a "simple" RPC service. This version is focused only on the request/reply pattern, but handles load balancing using ROUTER/DEALER sockets. While we definitely want to explore richer RPC APIs that allow for arbitrary socket types and message patterns, this version is very useful because of its extreme simplicity and good performance..

Still working on:

Making the serialization approach pluggable. I use pickle now, but will allow json and other approaches.
Adding an async client example.

Needs review:

Python3 compliance.
General API choices.

minrk · 2012-02-17T01:05:20Z

nice!

pluggable serialization will be a lot simpler if you just put serialize/deserialize in isolated methods, rather than bundling them in with dispatch.

ellisonbg · 2012-02-17T02:36:30Z

Yes good idea, I will try to get that done tonight.

On Thu, Feb 16, 2012 at 5:05 PM, Min RK
reply@reply.github.com
wrote:

nice!

pluggable serialization will be a lot simpler if you just put serialize/deserialize in isolated methods, rather than bundling them in with dispatch.

Reply to this email directly or view it on GitHub:
#174 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

ellisonbg · 2012-02-17T04:53:52Z

OK, I have made the serialization pluggable and added the async client example. I have one design question about the async proxy object's api. The current API looks like this:

echo.echo(callback, *args, **kwargs)

Where the callback is passed the result of the RPC call upon success and a RemoteRPCError instance upon failure. Getting an exception instance is a bit odd, do you think it would be better to pass a pair of callback, errback functions instead?

minrk · 2012-02-17T15:30:15Z

There are two choices for serialization:

(what you chose) serialize directly to bytes. This is obviously the easiest and most pluggable for serializing simple small data.
serialize to lists of bytes. This is less pluggable, because it ~always requires subclasses to define their own function, rather than trivially providing a library function.

The first is easiest to write, and easiest to drop in new serialization, but only the second can give you zero-copy, because a zero-copy message invariably turns an object into two or more parts (metadata + buffer).

As for callback/errback, I don't think passing an Exception to the callback makes sense, so you should probably specify an errback. Unfortunately, the signature you provided makes optionally specifying the errback impossible. You could define a single errback at the class level. I'm not sure how best to deal with that.

Something to consider: What happens when a worker never responds to a request?

ellisonbg · 2012-02-17T16:47:09Z

I actually think we could get both 1+2 in the following manner. If the serializer returns a list/tuple, send all element as message frames. Otherwise just send the result. On the deserialization side, if there is one message frame, pass that to the deserializer as is, otherwise pass a list of frames. Do you think that will work?

On the errback: what should the errback get passed? Also, how do you think we should pass the original exception to the RemoteRPCException so the user to figure out what went wrong?

Not sure how to handle failures. Would you integrate heartbeating? What would that API look like on the client side of things?

minrk · 2012-02-19T03:55:51Z

I actually think we could get both 1+2 in the following manner. If the serializer returns a list/tuple, send all element as message frames. Otherwise just send the result. On the deserialization side, if there is one message frame, pass that to the deserializer as is, otherwise pass a list of frames. Do you think that will work?

Hm, not really. It would only work if you require that a list-returning serializer never returns a list of length 1. Also, the more black-box approach should probably be passed both args and kwargs, rather than getting them separately, because how are you going to determine how many frames should be passed to the args deserializer, and how many should be passed for kwargs?

On the errback: what should the errback get passed? Also, how do you think we should pass the original exception to the RemoteRPCException so the user to figure out what went wrong?

Probably the same thing we do in IPython - ename, evalue, traceback as strs.

Not sure how to handle failures. Would you integrate heartbeating? What would that API look like on the client side of things?

I would skip heartbeats, and just use timeouts. This would mean requests would need to discard replies with the wrong msg_id, as they would be stale replies from timed out requests.

* Central Serializer class is used, which allows more customization of the serialization of args/kwargs/result. * Errback added to the AsyncRPCServiceProxy. This errback is passed the ename, evalue, tb as strings. * Timeout added to the AsyncRPCServiceProxy. * self.urls removed.

* Examples use errback and timeout. * Refactored the Serializer class and created a JSONSerializer whose usage is demoed in the examples. * Fixed minor bugs in simplerrpc.

ellisonbg · 2012-02-21T21:52:55Z

OK I think I have fixed everything. For the serialization I created a Serializer object that supports both full blown customization as well as the simpler approach. I like how that part of it turned out.

I have implemented errback and timeout and added relevant examples.

Only one more question: is there anything I need to do to target Python 3.0 that you can see?

dcolish · 2012-02-25T23:15:23Z

Not sure if there's room for more serializers, especially ones which add more dependencies, but I threw together a symmetric cipher serializer here in an example https://gist.github.com/1911540.

* Finish multi-ident handling. * ename, evalue, tb is serialized in a better manner. * Changes from PR #183 incorporated. * Updated example.

ellisonbg · 2012-02-26T19:45:10Z

@minrk: what do you think about including this serializer? If we include this, it might be nice to improve the API of the base class to handle loads/dumps separately from the encrypt/decrypt. The reason for this is that it would be useful to change the serialization without changing the encryption and vice versa.

minrk · 2012-02-26T23:35:52Z

I think it's a useful example, showing how to do subclassing for your own serialization/encryption, so it should go in examples. I don't think encrypt/decrypt should be separate from serialize/deserialize.

I wouldn't want to give anyone the illusion that this code is secure, since it makes no protection against fuzz attacks, or any such thing.

dcolish · 2012-02-27T04:10:32Z

I think the encryption itself is sound save for the trivial key. You're right about the security issues since there's no effort to sanitize user inputs. It would not be a lot more work to use a library like Flatland to ensure user input is sane. I could improve the example to do that if you think its worth it.

minrk · 2012-02-27T04:16:05Z

I think the example is good as it is, and helpful for users who want to write derivatives with their own serialization and/or encryption. Simple error handling when unpacking messages should be done in the RPC code itself.

ellisonbg · 2012-02-27T17:48:56Z

OK, can you create a new pull request that puts the serializer example
in examples/web?

On Sun, Feb 26, 2012 at 8:16 PM, Min RK
reply@reply.github.com
wrote:

I think the example is good as it is, and helpful for users who want to write derivatives with their own serialization and/or encryption. Simple error handling when unpacking messages should be done in the RPC code itself.

Reply to this email directly or view it on GitHub:
#174 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

ellisonbg · 2012-02-27T18:27:33Z

@minrk: this one is ready to go as well. I can do the merge if you want.

minrk · 2012-02-27T20:00:33Z

I still don't like the fact that you have to specify callback, errback, and timeout every time. That seems pretty annoying and un-pythonic. Can you think of a way to make errback and timeout optional args?

ellisonbg · 2012-02-27T21:55:47Z

If we want to pass _args and *_kwargs, we can't really use keywords
arguments. We could switch to the form:

call(func, callback=None, errback=None, timeout=-1, args=None, kwargs=None)

And force people to pass tuple and dicts for args, kwargs. Which do you prefer?

On Mon, Feb 27, 2012 at 12:00 PM, Min RK
reply@reply.github.com
wrote:

I still don't like the fact that you have to specify callback, errback, and timeout every time. That seems pretty annoying and un-pythonic. Can you think of a way to make errback and timeout optional args?

Reply to this email directly or view it on GitHub:
#174 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

minrk · 2012-02-28T01:16:26Z

mmh, this is always a decision we have to make - is passing _args, *_kwargs nice enough that it is worth requiring multiple extra arguments that are rarely used be specified every time. We sidestep this in IPython.parallel by setting defaults via attributes. This sort of approach might make sense, at least where errback/timeout are concerned, for which the values will likely be the same for every single call for a given RPC instance.

dcolish · 2012-02-28T04:16:51Z

As one who really wants to use this rpc library, I would not want the apparent interface to expose any inability to just call functions via the proxy. Forcing the users to use args=, kwargs= is pretty rough.

I like the idea of setting these values on a class if you can allow users to pass their own class to use for method wrapping. Its unlikely a user would want these values to change for specific functions.

ellisonbg · 2012-02-28T05:04:30Z

But I don't think the class default solves the issue in this case
because callback and errback will likely change from call to call. I
guess I am probably leaning towards keeping it the same as it is.

On Mon, Feb 27, 2012 at 8:16 PM, Dan Colish
reply@reply.github.com
wrote:

As one who really wants to use this rpc library, I would not want the apparent interface to expose any inability to just call functions via the proxy. Forcing the users to use args=, kwargs= is pretty rough.

I like the idea of setting these values on a class if you can allow users to pass their own class to use for method wrapping. Its unlikely a user would want these values to change for specific functions.

Reply to this email directly or view it on GitHub:
#174 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

minrk · 2012-02-28T06:01:09Z

Compromise: just pull timeout off to an attribute? It seems extremely unlikely that this would ever change within a given application. I still think errback is likely to be None the majority of the time, and it seems like a poor Python interface that forces you to specify None as the second arg every time you call it. But forcing you to specify it encourages people to think about error handling, which I suppose is a good thing.

We have already discovered that there's no such thing as a really good interface that accepts **kwargs and supports optional behavior.

ellisonbg · 2012-03-01T00:42:56Z

OK license is updated.

darkdarkfruit · 2012-08-17T09:54:58Z

Why not merge this rpc part? I find it very useful.

minrk · 2012-08-17T17:01:56Z

The RPC code is incomplete and needs work, particularly on handling dying workers. I don't think it should go in before at least the interface is relatively settled.

ellisonbg · 2012-08-17T17:28:42Z

My memory is a bit fuzzy about this now. IIRC though, this stalled out
because we weren't sure where to put it, not because the code was
incomplete. I think the code is in a pretty good state actually. My
problem with it sitting where it is unmerged is that it violates the "merge
early, merge often" mantra that gets code into user's hands quickly and
allows development to iterate and proceed. I don't think it will be
improved by just sitting (it hasn't happened so far). I see to options:

I can run a few more tests and look it over and then we merge it. In
this scenario, we would also have to put them in ipython until the required
version of pyzmq is common. We would also have to do this to hedge against
API changes.
We take this and the zmq.web and put them in separate projects. The
design and purpose of the rpc stuff and zmq.web is so similar that I think
they should be handled in the same way. This would allow separate release
and maintenance cycles - there are pros and cons to that. We will need
these packages for IPython in the long run and having them separate forces
additional dependencies.

Neither option is thrilling.

@minrk, @takluyver, @fperez, Which do you prefer?

I do think that the core of pyzmq is quite stable and people are using it
in production. We have to approach changes very conservatively. But, that
shouldn't prevent us from adding new code that is less stable, that doesn't
affect the stability of the core.

On Fri, Aug 17, 2012 at 10:01 AM, Min RK notifications@github.com wrote:

The RPC code is incomplete and needs work, particularly on handling dying
workers. I don't think it should go in before at least the interface is
relatively settled.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/174#issuecomment-7828701.

Brian E. Granger
Cal Poly State University, San Luis Obispo
bgranger@calpoly.edu and ellisonbg@gmail.com

takluyver · 2012-08-17T20:45:50Z

Is the plan to later expand this to cover more complex RPC uses, or is it just intended to be a simple base that third parties can expand?

In the latter case, I'd side with including it in pyzmq. In the former case, I think a separate package would make more sense, as a matter of perception: there are lots of use cases for ZMQ besides RPC, and you probably don't want people to think pyzmq is turning into an RPC package.

minrk · 2012-08-17T21:12:12Z

We had decided that development should happen in code where it is used (IPython multiuser) rather than in isolation. Then, when it is demonstrated satisfactory and stable for a real application, it can be pulled upstream.

Cutting a release of pyzmq with RPC adds enormous inertia to the APIs, which I am not at all convinced are final (especially given that the API discussion here so far remains totally inconclusive).

ellisonbg · 2012-12-30T04:00:17Z

I have decided to maintain this as a separate project:

https://github.com/ellisonbg/zpyrpc

ellisonbg added 2 commits February 16, 2012 15:47

First draft of simplerpc module.

4bc4b0e

Adding RPC example and allowing multiple connect/bind.

484bd47

ellisonbg added 2 commits February 16, 2012 20:39

Serializer/deserializer are now pluggable.

1d98276

Adding async RPC example.

6ba014f

ellisonbg added 2 commits February 21, 2012 12:12

Updating simplerpc and its examples.

c14560b

* Examples use errback and timeout. * Refactored the Serializer class and created a JSONSerializer whose usage is demoed in the examples. * Fixed minor bugs in simplerrpc.

Fixing docstring in simplerpc.

67b08b3

ellisonbg added 3 commits February 25, 2012 15:38

Adding multi-identity handling.

2d26ba3

Finishing up work on zmq.rpc.

8bd2c2b

* Finish multi-ident handling. * ename, evalue, tb is serialized in a better manner. * Changes from PR #183 incorporated. * Updated example.

Fixing bug in error handling and introducing base RPCError class.

634cae6

Updating license to BSD.

8244c36

ellisonbg closed this Dec 30, 2012

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple RPC based service for PyZMQ #174

Simple RPC based service for PyZMQ #174

ellisonbg commented Feb 17, 2012

minrk commented Feb 17, 2012

ellisonbg commented Feb 17, 2012

ellisonbg commented Feb 17, 2012

minrk commented Feb 17, 2012

ellisonbg commented Feb 17, 2012

minrk commented Feb 19, 2012

ellisonbg commented Feb 21, 2012

dcolish commented Feb 25, 2012

ellisonbg commented Feb 26, 2012

minrk commented Feb 26, 2012

dcolish commented Feb 27, 2012

minrk commented Feb 27, 2012

ellisonbg commented Feb 27, 2012

ellisonbg commented Feb 27, 2012

minrk commented Feb 27, 2012

ellisonbg commented Feb 27, 2012

minrk commented Feb 28, 2012

dcolish commented Feb 28, 2012

ellisonbg commented Feb 28, 2012

minrk commented Feb 28, 2012

ellisonbg commented Mar 1, 2012

darkdarkfruit commented Aug 17, 2012

minrk commented Aug 17, 2012

ellisonbg commented Aug 17, 2012

takluyver commented Aug 17, 2012

minrk commented Aug 17, 2012

ellisonbg commented Dec 30, 2012

Simple RPC based service for PyZMQ #174

Simple RPC based service for PyZMQ #174

Conversation

ellisonbg commented Feb 17, 2012

minrk commented Feb 17, 2012

ellisonbg commented Feb 17, 2012

ellisonbg commented Feb 17, 2012

minrk commented Feb 17, 2012

ellisonbg commented Feb 17, 2012

minrk commented Feb 19, 2012

ellisonbg commented Feb 21, 2012

dcolish commented Feb 25, 2012

ellisonbg commented Feb 26, 2012

minrk commented Feb 26, 2012

dcolish commented Feb 27, 2012

minrk commented Feb 27, 2012

ellisonbg commented Feb 27, 2012

ellisonbg commented Feb 27, 2012

minrk commented Feb 27, 2012

ellisonbg commented Feb 27, 2012

minrk commented Feb 28, 2012

dcolish commented Feb 28, 2012

ellisonbg commented Feb 28, 2012

minrk commented Feb 28, 2012

ellisonbg commented Mar 1, 2012

darkdarkfruit commented Aug 17, 2012

minrk commented Aug 17, 2012

ellisonbg commented Aug 17, 2012

takluyver commented Aug 17, 2012

minrk commented Aug 17, 2012

ellisonbg commented Dec 30, 2012