Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WaitIterator crashes if given the same future multiple times #2029

Open
Groxx opened this issue Apr 27, 2017 · 1 comment
Open

WaitIterator crashes if given the same future multiple times #2029

Groxx opened this issue Apr 27, 2017 · 1 comment
Labels

Comments

@Groxx
Copy link

Groxx commented Apr 27, 2017

While trying to do some shenanigans, I built a utility to simplify yield gen.multi(..., quiet_exceptions=...) for a bunch of code. One of my goals was to be able to abort a yield on the first-available error when yielding on multiple futures, rather than waiting for all futures to resolve - this can lead to a fairly significant response-time improvements when multiple errors occur, and one of them is due to a lengthy timeout.

So I made a small helper with WaitIterator, and started getting errors like this in tests:

  File ".../lib/helpers.py", line 275, in parallel
    result = yield iterator.next()
  File ".../env/local/lib/python2.7/site-packages/tornado/gen.py", line 428, in next
    self._return_result(self._finished.popleft())
  File ".../env/local/lib/python2.7/site-packages/tornado/gen.py", line 445, in _return_result
    self.current_index = self._unfinished.pop(done)
KeyError: <Future at 0x7f2fe2eedd10 state=finished returned MyEntity>

After a bit of hunting, I narrowed it down to this (simplified pieces of WaitIterator):

class WaitIterator(object):
    def __init__(self, *args, **kwargs):
        if args and kwargs:
            raise ValueError(
                "You must provide args or kwargs, not both")

        if kwargs:
            self._unfinished = dict((f, k) for (k, f) in kwargs.items())
            futures = list(kwargs.values())
        else:
# note that this is a dictionary keyed off items in args
            self._unfinished = dict((f, i) for (i, f) in enumerate(args))
# while this is a list
            futures = args

# and this is also a list
        self._finished = collections.deque()
        self.current_index = self.current_future = None
        self._running_future = None

        for future in futures:
            future.add_done_callback(self._done_callback)

    def next(self):
        self._running_future = TracebackFuture()

        if self._finished:
# pops off a future from the list
            self._return_result(self._finished.popleft())

        return self._running_future

    def _return_result(self, done):
        chain_future(done, self._running_future)

        self.current_future = done
# and this removes the *single* future-key that matches
        self.current_index = self._unfinished.pop(done)

This crash can be demonstrated with code like this:

@coroutine
def tmp():
  pass

f = tmp()
i = WaitIterator(f, f, f)
while not i.done():
  yield i.next()

In a nutshell, we have some parallel calls that we've mocked to return the same Future. This results in a single future being in the list multiple times, which gets deduplicated in the _unfinished dictionary, so the second duplicate that's finished errors with a KeyError.

This isn't actually breaking anything currently, but it strikes me as a potential landmine, and would've broken some experiments I've been planning. The workaround for users like me is to dedup manually / wrap everything in a new Future / etc, which I can do, but this was at least surprising and took some time to hunt down.


IMO this needs one of two things. Both seem fine to me:

  • Don't convert to a dictionary like this, keep both as lists. _unfinished.pop(_unfinished.index(done)) in _return_result wouldn't have this problem.
    • I personally like this. Parallel yields are likely to be relatively small quantities, there's a decent chance that it'll perform better in most cases (at least, in most languages - small list scanning and indexing often out-performs hashing). It also lets WaitIterator return whatever was passed in, regardless of what it was given, which is what I expected.
  • Document it. This is a pretty low-level tool, it shouldn't under any circumstances be surprising people who haven't read the source in detail. At the very least this isn't expected behavior from reading the docs, since it allows passing in a list and not only sets/dicts.

I can probably get a pull review up if it'd help, but I haven't yet looked into contributing here, and it seems like it'd be a pretty small change either way. And it's a bit esoteric, so I figured it needed some discussion to fit it in best with existing code :) Let me know what you think!

@bdarnell
Copy link
Member

I see WaitIterator as being usable for fairly large quantities, so I wouldn't want to change it to a form that would perform poorly for larger sets. I would also like to support reusing futures (one of the reasons Tornado's Futures don't support cancellation in the same way as asyncio Futures is that I want to support caching and other abstractions like what I think you're contemplating here), so I don't want to just document the limitation.

I think the best solution is to make _unfinished a multimap, allowing multiple keys for the same future.

@bdarnell bdarnell added the gen label Jan 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants