Navigation Menu

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gevent + fork + google.cloud.storage + urllib3/requests gives "'module' object has no attribute 'epoll'" #1268

Closed
maingoh opened this issue Sep 14, 2017 · 14 comments

Comments

@maingoh
Copy link

maingoh commented Sep 14, 2017

Hello,

I am using the latest version of requests (and urllib3 1.22). I need to fork and monkey patch the child process (the parent has to stay unpatched). I also need to instantiate a google cloud storage client before forking and this gives me this traceback when downloading a page in the child:

Traceback (most recent call last):
  File "/usr/lib/python2.7/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python2.7/multiprocessing/process.py", line 114, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python2.7/dist-packages/gipc/gipc.py", line 361, in _child
    target(*args, **kwargs)
  File "debug_tests.py", line 17, in startProcess
    print requests.get("https://google.fr")
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 72, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python2.7/dist-packages/urllib3/connection.py", line 326, in connect
    ssl_context=context)
  File "/usr/local/lib/python2.7/dist-packages/urllib3/util/ssl_.py", line 329, in ssl_wrap_socket
    return context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python2.7/dist-packages/urllib3/contrib/pyopenssl.py", line 443, in wrap_socket
    rd = util.wait_for_read(sock, sock.gettimeout())
  File "/usr/local/lib/python2.7/dist-packages/urllib3/util/wait.py", line 33, in wait_for_read
    return _wait_for_io_events(socks, EVENT_READ, timeout)
  File "/usr/local/lib/python2.7/dist-packages/urllib3/util/wait.py", line 22, in _wait_for_io_events
    with DefaultSelector() as selector:
  File "/usr/local/lib/python2.7/dist-packages/urllib3/util/selectors.py", line 581, in DefaultSelector
    return _DEFAULT_SELECTOR()
  File "/usr/local/lib/python2.7/dist-packages/urllib3/util/selectors.py", line 394, in __init__
    self._epoll = select.epoll()
AttributeError: 'module' object has no attribute 'epoll'

The smallest code I can give you is :

import os
import gipc
from google.cloud import storage

def start_process():
    from gevent import monkey
    monkey.patch_all()

    import requests
    print requests.get("https://google.fr")

class CloudStorage(object):

    def __init__(self, bucket_name):
        self.bucket_name = bucket_name
        storage_client = storage.Client()
        self.bucket = storage_client.get_bucket(self.bucket_name)


    def upload(self, remote_path):
        blob = self.bucket.blob(remote_path)
        blob.upload_from_filename("/tmp/test.txt")


if __name__ == "__main__":

    cloud_storage = CloudStorage("bucket-test")

    cloud_storage.upload("test")

    gipc.start_process(start_process)

I have tried :

  • urllib3 1.19.1 with requests 2.18.4 works (I have a warning because the version of urllib is too old for requests, doesn't work with higher versions of urllib3)
  • urllib3 1.22 with requests 2.12.5 works (doesn't work with higher versions of requests).
  • Using gevent.fork or os.fork instead of gipc but it doesn't change the error
  • If I do patch_all(select=False) it works but without concurrency
  • Uploading on s3 with boto3 instead works.
  • Just doing a simple requests.get("https://google.fr") instead of creating a google_storage & uploading in the parent works. Note that if I only instantiate the google storage client, without uploading, it works.

For now I will go with the version 1.19.1 but it seems that something has been broken since ? Or is it in the requests module ?

Thank you very much,

@haikuginger
Copy link
Contributor

haikuginger commented Sep 14, 2017

@maingoh, can you test with an application design that imports and calls the gevent monkey patch method before importing anything else? It looks like we're enumerating the selector options in select when first imported (probably as part of the from google.cloud import storage call), but post-monkeypatch, some of those selectors aren't available, causing the error you're seeing.

@maingoh
Copy link
Author

maingoh commented Sep 14, 2017

If I do :

    gipc.start_process(start_process)
    cloud_storage = CloudStorage("bucket-test")
    cloud_storage.upload("test")

=> It works

But then If I fork again after, then it fails. I just figured out that since my process is forked, then all modules are also imported in the child ..

Also If I monkey patch directly in the parent, at the beginning it works, but I don't want the parent to be monkey patched.

The problem is that I am making tests and need to fork between each tests. Would deleting all sys.modules in start_process() work ? It's a bit hacky but I don't find another way

@maingoh
Copy link
Author

maingoh commented Sep 14, 2017

It seem to fail only with https (replace the url by http://google.com and it works)

@maingoh
Copy link
Author

maingoh commented Sep 14, 2017

I managed to make it working by creating my child process using subprocess.Popen which is a little less convenient as I need to pass variables to the shell. But at least I have a fresh new process without module imported before.

It would be better If the first method was working though. Not sure if you can do something about it.

@haikuginger
Copy link
Contributor

haikuginger commented Sep 14, 2017

Yeah, this is tricky. We've run into this before with gevent patching; essentially the selector environment we end up working with can be different than the selector environment we initialize in. Generally, we've recommended ensuring that importing urllib3 (including through a module that uses us) happens after any monkeypatching.

Pinging @SethMichaelLarson for his thoughts, since he's worked a lot in that area of the project. Is this something we can or should change? I'm wondering if we can guard this line with a try/except looking for an AttributeError, and then re-run the logic to find a default selector. Might look like this?

def DefaultSelector():
    """ This function serves as a first call for DefaultSelector to
    detect if the select module is being monkey-patched incorrectly
    by eventlet, greenlet, and preserve proper behavior. """
    global _DEFAULT_SELECTOR
    if _DEFAULT_SELECTOR is not None:
        try:
            return _DEFAULT_SELECTOR()
        except AttributeError:
            _DEFAULT_SELECTOR = None
    if _can_allocate('kqueue'):
        _DEFAULT_SELECTOR = KqueueSelector
    elif _can_allocate('epoll'):
        _DEFAULT_SELECTOR = EpollSelector
    elif _can_allocate('poll'):
        _DEFAULT_SELECTOR = PollSelector
    elif hasattr(select, 'select'):
        _DEFAULT_SELECTOR = SelectSelector
    else:  # Platform-specific: AppEngine
        raise ValueError('Platform does not have a selector')
    return _DEFAULT_SELECTOR()

@sethmlarson
Copy link
Member

sethmlarson commented Sep 14, 2017

I personally don't have an issue with that change. If I'm understanding correctly (haven't had time to actually process everything in this thread) the part that's tripping up our current logic is the fact that it's being forked? Because the original reason we moved to this logic was so users could run:

import urllib3
from gevent import monkey
monkey.patch_all()

# Use urllib3 here...

and the world wouldn't end. Is it the forking what is causing this issue or something else?

@haikuginger
Copy link
Contributor

@SethMichaelLarson, I'm not sure if the issue is related to the fork or not - I think that's a red herring. It looks like it's that we're making a call (in cloud_storage.upload()) that uses a selector (which causes _DefaultSelector to be set to the best available choice from the stdlib) prior to the monkey patch; a choice that might not be valid after the monkeypatch. Basically, the above is just a way to recover from the case where the selector environment gets pulled out from under us.

@haikuginger
Copy link
Contributor

haikuginger commented Sep 14, 2017

Hrmm, I suppose that since we're declaring our wrapper classes conditionally, the above could run into issues if the monkey-patched select module had more selector options available than the stdlib module.

@sethmlarson
Copy link
Member

sethmlarson commented Sep 14, 2017

I think that we're trying to do too much if we try to solve that issue as well. Either monkey patch or don't. Monkey-patching away features of a module in the middle of a script is bound to break more things than urllib3 eventually. I would recommend moving the monkey-patch to the beginning of the script.

ie: I don't agree that urllib3 should account for what amounts to this:

import select
import urllib3

# Use urllib3

delattr(select, 'epoll')

# Use urllib3

@maingoh
Copy link
Author

maingoh commented Sep 15, 2017

So just to explain a little more what I was trying to do. I am making a worker which I want to be monkey patched for network concurrency. But the main app (which can be anything) is not necessarily gevent friendly, so I don't want my worker to break the current app.
I wanted the main app to launch the gevent worker, so I used subprocess.Popen but using multiprocessing.Process/gipc.start_process/fork would be more convenient for the launch and interconnection between the app & worker.
The fix with subprocess is ok for me though (as long as I pass simple arguments to the worker, and don't use Pipes/Queues etc).

@maingoh
Copy link
Author

maingoh commented Sep 15, 2017

It looks like it may be a gevent feature to add more than something to hack in urllib3.
Also not sure if it is technically possible for gevent to monkey patch inside the children only even after some modules are already imported.

@sethmlarson
Copy link
Member

If you want to go down this route you can try doing this:

# Stuff before monkey patching...
monkey.patch_all()
import urllib3.util.selectors
urllib3.util.selectors._DEFAULT_SELECTOR = None

This should reset urllib3s memory of what our default selector is by the time we get there and cause a retrial of all selectors post-patch in the forked process. I haven't tested this code so use at your own risk.

@maingoh
Copy link
Author

maingoh commented Sep 15, 2017

Just tried and it works ! I will stick with subprocess for now as it is safer ;)
Thanks !

@haikuginger
Copy link
Contributor

Okay; I'm going to close this out, as I think we can agree that no action ought to be taken as a result of this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants