Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZMQ event loop #362

Merged
merged 5 commits into from Jun 28, 2023
Merged

ZMQ event loop #362

merged 5 commits into from Jun 28, 2023

Conversation

waveform80
Copy link
Contributor

Checklist
  • I've ensured that similar functionality has not already been implemented
  • I've ensured that similar functionality has not earlier been proposed and declined
  • I've branched off the master or python-dual-support branch
  • I've merged fresh upstream into my branch recently
  • I've ran tox successfully in local environment
  • I've included docstrings and/or documentation and/or examples for my code (if this is a new feature)
Description:

Not sure if anyone's interested in this (I searched open and closed issues for ZMQ, 0MQ and ZeroMQ but came up empty ... so this might be rather niche!), but ZeroMQ is something we use pretty extensively in piwheels. The monitor for the master server is written using urwid for the interface, so one of the first things I did was make an urwid event loop that could watch both files and ZMQ queues (very similar to the existing event loops).

It occurred to me others might be interested in this (though apparently not so interested as to file a bug ;) so (after much delay) I've spent a little time beating it into shape and making sure it passes the tests on all the tox-listed variants of python. I've included doc-strings and the necessary links in the event-loops documentation, but I've been lazy and haven't added a specific example for it. Do let me know if one should be included.

Anyway, here it is in all its glory - again, I'm not sure if this is really niche or if there's sufficient interest to justify merging it, but on the off-chance someone goes googling for it, I guess it'll help to have it in the PR list (open or closed :).

@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 77.238% when pulling de8d9f5 on waveform80:zmq-event-loop into bb54986 on urwid:master.

@coveralls
Copy link

coveralls commented Sep 24, 2019

Coverage Status

Coverage increased (+0.007%) to 77.875% when pulling 02c004b on waveform80:zmq-event-loop into a5f27ba on urwid:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 77.238% when pulling de8d9f5 on waveform80:zmq-event-loop into bb54986 on urwid:master.

@tonycpsu
Copy link
Collaborator

@waveform80 : this looks great! I haven't used zmq in a few years, but a zmq event loop certainly seems like something folks could make use of. Thanks much for the contribution! Will take a closer look ASAP and merge if everything looks good on closer inspection.

@waveform80
Copy link
Contributor Author

@tonycpsu Thanks! Pushed a tiny correction before heading off to bed last night and I'm a bit confused by the travis error that resulted as it runs absolutely fine through tox here, and the failing test seems entirely unrelated (and only occurs in py37). Is that test (test_defargs in urwid.tests.test_vterm.TermTest) prone to transient errors?

@ulidtko
Copy link
Collaborator

ulidtko commented Sep 25, 2019

@waveform80 hey! Thanks for the PR. Appreciated how you cared to keep zmq an optional dependency. A feeling of certain lack of understanding of the interface prevents me from giving a big thick LGTM -- but on the surface of it, looks decent.

Re: tests, I'm looking into it. No, specifically that test doesn't seem to be racy, should be reproducible with absolutely stable results. Yet still, I don't repro that Travis failure locally just as you... however, I do get other (unrelated, too) failures on py37 and pypy. So apparently there's some kind of flakiness problem.

Perhaps this has something to do with '\e' being a deprecated escape sequence on py37. I'll have to investigate more.

@HenryYoung42
Copy link

I guess this is pending / gone cold ? I'm working on an urwid app that needs to receive ZMQ subscription events, and now faced with the choice of using this great addition that may end up not being supported, or trying to get the asyncio support that both urwid and zmq have to play nice with each other. Any suggestions gratefully received plus happy to help out / contribute if able.

@waveform80
Copy link
Contributor Author

Oh, I'd forgotten this was still open. As it happens we're still using ZMQ with urwid in piwheels; I'll try and remember to update this tonight and at least make sure it's mergeable

@HenryYoung42
Copy link

I hope it helps to know you have an appreciative customer !

@waveform80
Copy link
Contributor Author

Okay, rebased onto master - dropped py34, added py38 to the tox config and rejigged the zmq dependency in there as it appears that it "just works" on pypy now. Everything passed locally - hopefully CI should be fine too.

@HenryYoung42
Copy link

HenryYoung42 commented Oct 30, 2020

Because this has not been merged into urwid, and because I prefer to use the standard release version of both zmq and urwid, I started trying to convert the ZMQEevntLoop class into an addon file to standard urwid that simply adds the new class. But dependencies like EventLoop are not exposed outside of urwid. Is there a neat way to refactor this as just a file you can include in your app alongside urwid ?

@HenryYoung42
Copy link

HenryYoung42 commented Oct 30, 2020

Ah - just being dumb. I need to do:

from urwid.main_loop import EventLoop
import urwid

class zmqEventLoop(EventLoop):

@waveform80
Copy link
Contributor Author

@ulidtko Any chance this might get merged (just to save @HenryYoung42 a bit of effort, though I admit I wouldn't mind cleaning up piwheels codebase a bit too :)?

Seems there's a few (okay, erm, 2) people using it from this PR, which suggests to me more might be interested if they could find it in the main docs? After the current rebase onto master it's also a bit cleaner than before, and the tests seem to pass off the bat.

@HenryYoung42
Copy link

Even if it gets merged I'm still using my "adapter file" until the next release of urwid !

@HenryYoung42
Copy link

I'm running with substantially the event loop class as a stand alone file with the last urwid release.
zmqeventloop.py.txt

@HenryYoung42
Copy link

Cautionary heads up - my adaptation of this is not working in my use case where I am not using it with a zmq queue but with a zqm subscriber:

def run(self):
	self.event_loop = zmqEventLoop()

	# Initialise zmq
	context    = zmq.Context()
	subscriber = context.socket(zmq.SUB)
	subscriber.connect('tcp://%s'%settings.zmq_address)
	subscriber.setsockopt(zmq.SUBSCRIBE, b'')
	self.event_loop.watch_queue(subscriber, self.zmqHandler, zmq.POLLIN)

	# Initialize & run urwid
	self.build_ui()
	self.loop = urwid.MainLoop(widget = self.window, palette = self._palette, handle_mouse = True, unhandled_input = self.unhandled_keypress, event_loop = self.event_loop)
	self.loop.set_alarm_in(1, self.clock_refresh)
	self.loop.run()

The symptom is I have a 1 second updating clock on screen initiated by the set_alarm_in call whose callback fires the same timer again. This runs fine until the zmq subscription receives data at which point the event queue stalls. I am yet to investigate why, but thought to post here in case you can say "oh yeah it will only work watching queues - if you're using a subscription you need to do xyz ...". Strong chance it's all my own stupidity somewhere ... are queues and subscriptions the same thing in zmq ?

@HenryYoung42
Copy link

Turns out it's working. The problem was that I was not calling subscriber.recv_multipart(zmq.DONTWAIT) after receiving the zmq pollin event. Obviously I should fetch the received data correctly, but for initial testing purposes I had omitted that and was just logging the event occurrence. I don't know if this lockup due to the zmq poller not getting emptied out is a concern or not in terms of this event loop implementation. In my mind it remains an open question as to whether the event loop should lock up if the data is not fetched, or should it be more robust to this case ? The issue seems due to an unfavorable interaction between state variables used by _loop and the zmq poller state (not fetching the data means that ready remains True unexpectedly). In practice the data should always be fetched through the appropriate zmq call so this should not be a problem in practice.

@waveform80
Copy link
Contributor Author

Ah, I was wondering what was up here as we use a subscriber queue in our piwheels monitor to relay messages from the master to any & all connected monitors so I was reasonably sure we would've noticed anything horribly wrong.

Interesting point about the interaction when data isn't fetched. I'll see if I can come up with something with a little more finesse. As you note, it's a bit of a corner case, but it'd be nice to avoid biting people if it can be avoided.

@HenryYoung42
Copy link

If you crack that you'll have my vote for a merge (for what it's worth). Also highlights the perils of turning something developed with a single code base using it into a more generically useful enhancement !

@waveform80
Copy link
Contributor Author

Okay, I've figured this out and ... firstly I'm pretty convinced it's not a bug (though I did come across another bug while investigating this, which I'll post a fix for in a mo - so it was still worth it :). Anyway, here's what's going on:

There are roughly three things the ZMQ event loop can do: fire an callback when a file/queue is ready, fire a callback when an alarm is ready, or fire the idle callback. In the absence of anything being ready, the event loop either waits indefinitely on the associated files/queues, or for a limited time for the next alarm (if any alarms are registered). However, the event loop has a definite "preference". In each iteration of its internal loop the following happens:

  • If any files/queues are ready, the associated callback will be called, regardless of whether any alarms are pending.

  • If no files/queues are ready, but an alarm is pending, then the alarm handler will be called.

  • If no files/queues are ready, and no alarms are pending, then the idle handler will be called.

So, you can see the problem: if a file/queue is permanently "ready" then its associated callback will just be called again and again and again, in preference to anything else. It's also worth noting that the draw_screen method is, by default, only called from the idle handler so, if your application is never idle, and never explicitly calls draw_screen it will appear frozen (though it's actually very busy repeatedly calling that ready file/queue handler).

Now, why don't I think this is a bug? Because it's exactly what the default SelectEventLoop does too:

The ZMQEventLoop in this PR is closely modelled after the SelectEventLoop. In particular, I basically adapted SelectEventLoop.loop when writing ZMQEventLoop.loop. To give an idea of how closely the two are related, I reproduced this issue with the default SelectEventLoop simply by substituting some UNIX sockets for the ZMQ queues.

The following gist contains several little demo scripts I whipped up while looking into this: https://gist.github.com/waveform80/1be2a233bc1f20dbff0ef2be10ee2f40

  • zmq_send.py and zmq_recv_1.py - run these two in separate terminals to show a simple application that receives messages typed on the command line and displays them alongside a simple clock

  • sel_send.py and sel_recv_1.py - same thing but with SelectEventLoop; note that for these the receiving end must be started first (ZMQ doesn't care but ordinary sockets do)

  • zmq_send.py and zmq_recv_2.py - this variant shows what happens when the receiving queue isn't "serviced"; note that draw_screen is explicitly called to update the display from the handler

  • sel_send.py and sel_recv_2.py - same thing but with SelectEventLoop; again start the receiving end first.

As to the bug I found when looking into this? I screwed up the poll timeout: ZMQ uses milli-seconds, not seconds :)

@HenryYoung42
Copy link

Your analysis explains precisely the symptoms I observed. I now agree with you that this is not a bug but a feature inherent in the operation of zmq. For that reason I am happy, as a user of a trivial adaptation of your zmq event loop enhancement, to give my vote of approval (for what it's worth). Thankfully the timeout slip was the "right" way round to be relatively harmless ;)

@penguinolog penguinolog added the Feature Feature request/implementation label May 1, 2023
@waveform80
Copy link
Contributor Author

Rebased to resolve conflicts, and updated structure (and tests) to match the split out event-loop structure

The actual code is pretty well tested (we've been using it in the main
piwheels monitor for over a year), but that's just one app.
That was ... surprisingly easy (once the implicit close of fds was
figured out, and a version of zmq that worked on pypy was found!)
For ignoring interrupted polls in run() (import was in the module most
of this was copied from in piwheels, but naturally I forgot to check the
imports!)
@waveform80
Copy link
Contributor Author

Another push just to remove the spurious changes in tox.ini (my rebasing had changed the final line-ending; doesn't really matter but keeps the PR simpler)

Copy link
Collaborator

@penguinolog penguinolog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small comment and LGTM

urwid/event_loop/zmq_loop.py Outdated Show resolved Hide resolved
This was originally here to make zmq an optional import, but this module
as a whole is now optional courtesy of the parent event_loop module
@penguinolog penguinolog merged commit f7d2f2f into urwid:master Jun 28, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature Feature request/implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants