WaitIterator crashes if given the same future multiple times #2029

Groxx · 2017-04-27T22:52:44Z

While trying to do some shenanigans, I built a utility to simplify yield gen.multi(..., quiet_exceptions=...) for a bunch of code. One of my goals was to be able to abort a yield on the first-available error when yielding on multiple futures, rather than waiting for all futures to resolve - this can lead to a fairly significant response-time improvements when multiple errors occur, and one of them is due to a lengthy timeout.

So I made a small helper with WaitIterator, and started getting errors like this in tests:

  File ".../lib/helpers.py", line 275, in parallel
    result = yield iterator.next()
  File ".../env/local/lib/python2.7/site-packages/tornado/gen.py", line 428, in next
    self._return_result(self._finished.popleft())
  File ".../env/local/lib/python2.7/site-packages/tornado/gen.py", line 445, in _return_result
    self.current_index = self._unfinished.pop(done)
KeyError: <Future at 0x7f2fe2eedd10 state=finished returned MyEntity>

After a bit of hunting, I narrowed it down to this (simplified pieces of WaitIterator):

class WaitIterator(object):
    def __init__(self, *args, **kwargs):
        if args and kwargs:
            raise ValueError(
                "You must provide args or kwargs, not both")

        if kwargs:
            self._unfinished = dict((f, k) for (k, f) in kwargs.items())
            futures = list(kwargs.values())
        else:
# note that this is a dictionary keyed off items in args
            self._unfinished = dict((f, i) for (i, f) in enumerate(args))
# while this is a list
            futures = args

# and this is also a list
        self._finished = collections.deque()
        self.current_index = self.current_future = None
        self._running_future = None

        for future in futures:
            future.add_done_callback(self._done_callback)

    def next(self):
        self._running_future = TracebackFuture()

        if self._finished:
# pops off a future from the list
            self._return_result(self._finished.popleft())

        return self._running_future

    def _return_result(self, done):
        chain_future(done, self._running_future)

        self.current_future = done
# and this removes the *single* future-key that matches
        self.current_index = self._unfinished.pop(done)

This crash can be demonstrated with code like this:

@coroutine
def tmp():
  pass

f = tmp()
i = WaitIterator(f, f, f)
while not i.done():
  yield i.next()

In a nutshell, we have some parallel calls that we've mocked to return the same Future. This results in a single future being in the list multiple times, which gets deduplicated in the _unfinished dictionary, so the second duplicate that's finished errors with a KeyError.

This isn't actually breaking anything currently, but it strikes me as a potential landmine, and would've broken some experiments I've been planning. The workaround for users like me is to dedup manually / wrap everything in a new Future / etc, which I can do, but this was at least surprising and took some time to hunt down.

IMO this needs one of two things. Both seem fine to me:

Don't convert to a dictionary like this, keep both as lists. _unfinished.pop(_unfinished.index(done)) in _return_result wouldn't have this problem.
- I personally like this. Parallel yields are likely to be relatively small quantities, there's a decent chance that it'll perform better in most cases (at least, in most languages - small list scanning and indexing often out-performs hashing). It also lets WaitIterator return whatever was passed in, regardless of what it was given, which is what I expected.
Document it. This is a pretty low-level tool, it shouldn't under any circumstances be surprising people who haven't read the source in detail. At the very least this isn't expected behavior from reading the docs, since it allows passing in a list and not only sets/dicts.

I can probably get a pull review up if it'd help, but I haven't yet looked into contributing here, and it seems like it'd be a pretty small change either way. And it's a bit esoteric, so I figured it needed some discussion to fit it in best with existing code :) Let me know what you think!

The text was updated successfully, but these errors were encountered:

bdarnell · 2017-05-20T20:49:10Z

I see WaitIterator as being usable for fairly large quantities, so I wouldn't want to change it to a form that would perform poorly for larger sets. I would also like to support reusing futures (one of the reasons Tornado's Futures don't support cancellation in the same way as asyncio Futures is that I want to support caching and other abstractions like what I think you're contemplating here), so I don't want to just document the limitation.

I think the best solution is to make _unfinished a multimap, allowing multiple keys for the same future.

bdarnell added the gen label Jan 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WaitIterator crashes if given the same future multiple times #2029

WaitIterator crashes if given the same future multiple times #2029

Groxx commented Apr 27, 2017 •

edited

Loading

bdarnell commented May 20, 2017

WaitIterator crashes if given the same future multiple times #2029

WaitIterator crashes if given the same future multiple times #2029

Comments

Groxx commented Apr 27, 2017 • edited Loading

bdarnell commented May 20, 2017

Groxx commented Apr 27, 2017 •

edited

Loading