Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async kernel launch fails on Windows #31

Closed
kevin-bates opened this issue Nov 8, 2019 · 6 comments · Fixed by #34
Closed

Async kernel launch fails on Windows #31

kevin-bates opened this issue Nov 8, 2019 · 6 comments · Fixed by #34

Comments

@kevin-bates
Copy link
Collaborator

The asynchronous support changes are causing issues on Windows. Regardless of which event loop is used, we wind up encountering different NotImplementedError exceptions. I have focused on Python 3.7 - since we fail even sooner on Python 3.5 and 3.6. Below are the test failures for a single test (test_start_new_kernel) which triggers a launch. All launch-related tests fail with the same issue(s). I can't determine if there are follow-on issues once this issue gets resolved.

event loop: WindowsSelectorEventLoop (default)

When the WindowsSelectorEventLoop is used, either explicitly or by default, the call to asyncio.create_subprocess_exec() fails when calling the internal method _make_subprocess_transport().

____________________________ test_start_new_kernel ____________________________
    async def test_start_new_kernel():
>       km, kc = await start_new_kernel(make_ipkernel_cmd(), startup_timeout=TIMEOUT)
jupyter_kernel_mgmt\tests\test_async_manager.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
jupyter_kernel_mgmt\subproc\launcher.py:337: in start_new_kernel
    info, km = await launcher.launch()
jupyter_kernel_mgmt\subproc\launcher.py:83: in launch
    kernel = await asyncio.create_subprocess_exec(*args, **kw)
c:\miniconda36-x64\lib\asyncio\subprocess.py:217: in create_subprocess_exec
    stderr=stderr, **kwds)
c:\miniconda36-x64\lib\asyncio\base_events.py:1533: in subprocess_exec
    bufsize, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <_WindowsSelectorEventLoop running=False closed=False debug=False>
protocol = <SubprocessStreamProtocol>
args = ('c:\\miniconda36-x64\\python.exe', '-m', 'ipykernel_launcher', '-f', 'C:\\Users\\appveyor\\AppData\\Roaming\\jupyter\\runtime\\kernel-653c313c-5cd573defee08b306ab6eec6.json')
shell = False, stdin = -1, stdout = None, stderr = None, bufsize = 0
extra = None
kwargs = {'cwd': 'C:\\projects\\jupyter-kernel-mgmt', 'env': {'7ZIP': '"C:\\Program Files\\7-Zip\\7z.exe"', 'ALLUSERSPROFILE': 'C:\\ProgramData', 'APPDATA': 'C:\\Users\\appveyor\\AppData\\Roaming', 'APPVEYOR': 'True', ...}}
    async def _make_subprocess_transport(self, protocol, args, shell,
                                         stdin, stdout, stderr, bufsize,
                                         extra=None, **kwargs):
        """Create subprocess transport."""
>       raise NotImplementedError
E       NotImplementedError
c:\miniconda36-x64\lib\asyncio\base_events.py:463: NotImplementedError

event loop: WindowsProactorEventLoop

Results when WindowsProactorEventLoop is used, by explicitly configuring the event loop policy via asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy()), the launch completes, but we fail to setup the kernel client instance since the instantiation of IOLoopKernelClient fails because the ProactorEventLoop doesn't implement add_reader().

____________________________ test_start_new_kernel ____________________________
    async def test_start_new_kernel():
>       km, kc = await start_new_kernel(make_ipkernel_cmd(), startup_timeout=TIMEOUT)
jupyter_kernel_mgmt\tests\test_async_manager.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
jupyter_kernel_mgmt\subproc\launcher.py:338: in start_new_kernel
    kc = IOLoopKernelClient(info, manager=km)
jupyter_kernel_mgmt\client.py:51: in __init__
    self.streams[channel] = s = zmqstream.ZMQStream(socket, self.ioloop)
c:\miniconda36-x64\lib\site-packages\zmq\eventloop\zmqstream.py:127: in __init__
    self._init_io_state()
c:\miniconda36-x64\lib\site-packages\zmq\eventloop\zmqstream.py:546: in _init_io_state
    self.io_loop.add_handler(self.socket, self._handle_events, self.io_loop.READ)
c:\miniconda36-x64\lib\site-packages\tornado\platform\asyncio.py:99: in add_handler
    self.asyncio_loop.add_reader(fd, self._handle_events, fd, IOLoop.READ)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <ProactorEventLoop running=False closed=False debug=False>, fd = 800
callback = <bound method BaseAsyncIOLoop._handle_events of <tornado.platform.asyncio.AsyncIOMainLoop object at 0x000000D7B3FF4780>>
args = (800, 1)
    def add_reader(self, fd, callback, *args):
>       raise NotImplementedError
E       NotImplementedError
c:\miniconda36-x64\lib\asyncio\events.py:505: NotImplementedError

Since we know kernels can be launched synchronously using the standard subprocess module on Windows and can visually confirm the WindowsSelectorEventLoop implements add_reader() - which is also backed by the fact that sync kernel launches work, its probably in our best interest to try to use the standard subprocess module for kernel launches with WindowsSelectorEventLoop.

Also, because asyncio switches the default event loop to WindowsProactorEventLoop in python 3.8, we'll want to go ahead and "pin" the selector event loop until a better solution is found.

@echarles
Copy link
Contributor

echarles commented Nov 9, 2019

Hi Kevin, a few comments/questions

PS: What is the markdown syntax you use to create those lists where items you can opened/closed?
Screenshot 2019-11-09 at 07 04 27

@kevin-bates
Copy link
Collaborator Author

@echarles - thanks for digging into this.

  1. The reason you don't find instances of either Windows event loop in master or most anywhere else in the Jupyter ecosystem, is because the Windows default - WindowsSelectorEventLoop - works fine up until the async changes introduced the call to asyncio.create_subprocess_exec(). This method uses a portion of the event loop (i.e., requires _make_subprocess_transport) that WindowsSelectorEventLoop doesn't implement. However, WindowsProactorEventLoop does. Thus the attempt to use the latter.
  2. Yes, I've been following that issue although things didn't start making sense until recently. That issue is really about Python 3.8 support and how in Py 3.8, they changed the default event loop on Windows to WindowsProactorEventLoop. But, if we wanted to use that event loop, the second issue occurs - which is more of what the other issue about since that code exists prior to the async changes we're introducing. So we have a catch 22.
  3. Regarding a workaround, I think we need to fall back to use the traditional (non-async) form of Popen - we know that works. I'm not sure if we gain many benefits using the async popen anyway since I've always found the Popen portion of startup to be one of the faster aspects (at least in terms of remote kernels). I suppose we could try to support both (using Popen for win32 only), but only if it doesn't trickle into other hassles (I think the wait timeout handling is quite a bit different).

The twisty expansion stuff (can never remember the term) is performed using straight html that wraps the contained markdown in a "details/paragraph" thing. From above, here's the header...

<details><summary>event loop: WindowsSelectorEventLoop (default)</summary><p>

then regular markdown . Be sure there's a blank link between the header and the markdown. Then cap things off with the "footer".

</p></details>

@takluyver
Copy link
Owner

Ugh. The tricky bit with using the non-asyncio Popen is that the .wait() method will also be blocking, and I don't think there's any good cross-platform way to avoid that (I think the lack of a nice cross-platform way to asynchronously wait on a child process is part of why this issue exists).

I guess a workaround would be to have one thread waiting for each child process, ready to pass the information back to the event loop in the main thread. But that's kind of ugly.

@kevin-bates
Copy link
Collaborator Author

I think keeping it simple is more important. Wait is only used to cleanup after a kill/shutdown - correct? I never encountered issues with the way things worked previously, have you? Your concern is that a kernel is shutdown, no longer active, and then wait() blocks forever?

@takluyver
Copy link
Owner

The way things are currently implemented, I think it's very unlikely to get stuck in a wait. But:

  • If your kernel is stuck in certain kinds of I/O, it could go into uninterruptible sleep, in which case it's effectively unkillable, so the wait() after killing it could block.
  • Application code (or JKM code in the future) may want to try SIGTERM and wait a few seconds before bringing out the SIGKILL hammer. Without an async wait, that means blocking the event loop.
  • It would be nice to implement the restarter functionality on an async wait() call, rather than polling.

Maybe the best option is to do this nicely on Unix (possibly with the asyncio child watcher interface), and have a wait() implementation that polls subprocesses on Windows.

@kevin-bates
Copy link
Collaborator Author

yeah, I'll get something working on Windows, then look at "ifdef-ing" for windows. The sync wait does take a timeout, so I think just hard-coding something small - 1 or 2 seconds - and logging a warning on timeout - probably wouldn't be the end of the world.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants