Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default context on MacOS should be spawn #65

Open
mmckerns opened this issue Sep 27, 2019 · 5 comments
Open

Default context on MacOS should be spawn #65

mmckerns opened this issue Sep 27, 2019 · 5 comments
Labels

Comments

@mmckerns
Copy link
Member

mmckerns commented Sep 27, 2019

The default context on MacOS was changed from 'spawn' to 'fork' in 7257130, to be consistent with previous versions of python. This change comes from multiprocessing changing the default as noted in https://bugs.python.org/issue33725.

The reasoning for switching back to 'fork' is that 'spawn' causes some issues that can be seen in the examples (detailed below).

@mmckerns mmckerns added the bug label Sep 27, 2019
@mmckerns
Copy link
Member Author

For python 3.8bX, with multiprocess before 7257130 (using 'spawn' as the default context) and with dill.settings['recurse'] = False, we see errors like the following:

$ python ex_pool.py
Ordered results using pool.apply_async():
multiprocess.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.8/site-packages/multiprocess/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
  File "ex_pool.py", line 16, in calculate
    result = func(*args)
  File "ex_pool.py", line 24, in mul
    time.sleep(0.5*random.random())
NameError: name 'time' is not defined
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "ex_pool.py", line 295, in <module>
    test()
  File "ex_pool.py", line 68, in test
    print('\t', r.get())
  File "/Users/mmckerns/lib/python3.8/site-packages/multiprocess/pool.py", line 768, in get
    raise self._value
NameError: name 'time' is not defined

$ python ex_synchronize.py
10 Process Process-1:
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.8/site-packages/multiprocess/process.py", line 313, in _bootstrap
    self.run()
  File "/Users/mmckerns/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "ex_synchronize.py", line 17, in value_func
    random.seed()
NameError: name 'random' is not defined

$ python ex_workers.py
Unordered results:
Process Process-1:
Traceback (most recent call last):
  File "/Users/mmckerns/lib/python3.8/site-packages/multiprocess/process.py", line 313, in _bootstrap
    self.run()
  File "/Users/mmckerns/lib/python3.8/site-packages/multiprocess/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "ex_workers.py", line 23, in worker
    result = calculate(func, args)
NameError: name 'calculate' is not defined

@mmckerns
Copy link
Member Author

However, using fork as the default context -- or using recurse=True -- removes these errors. I believe spawn is also responsible for needing shutdown in pathos (uqfoundation/pathos@640e84e). At least in 3.8 in the noted conditions above (i.e. on a mac with recurse=False), the use of spawn causes some object to be pickled upon __init__ of a Pool. This was traced down with pdb to somewhere under calling the __init__ method on popen_fork.Popen (called from __init__ in popen_spawn_posix.Popen) due to calling the start() method of a Process within the call of the _repopulate_pool_static method.

Instead of aborting and using fork, which is known to be unstable on MacOS... behavior should be fixed for spawn in the relevant case.

@nickodell
Copy link

Hi, I'm running into a similar issue with multiprocess. We're using p_tqdm, which uses Pathos internally, which uses this library.

I have a minimized code sample which shows the issue. The code sample imports Pandas, then uses p_tqdm to run two HTTP requests in parallel.

import requests
import pandas
from p_tqdm import p_map


def run(arg):
    r = requests.get(arg)
    return r.status_code


def main():
    # import multiprocess.context as ctx
    # ctx._force_start_method('spawn')
    urls = ['http://google.com'] * 2
    status_codes = p_map(run, urls)
    print(status_codes)


if __name__ == '__main__':
    main()

Running this code sample on OS X gives the following output:

objc[74572]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[74572]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
objc[74573]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called.
objc[74573]: +[__NSCFConstantString initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
  0%|                                                                                             | 0/2 [00:00<?, ?it/s]^C

It then hangs forever.

The import pandas part is not a mistake - it's necessary to trigger the bug, even though it's not used. Something about importing Pandas, forking, then making any kind of network request causes the error.

I've also done the following experiments:

  1. If I uncomment the code to set the start method to spawn, it no longer hangs.

  2. If I edit multiprocess/context.py and change this line:

    _default_context = DefaultContext(_concrete_contexts['fork']) #FIXME: spawn
    

    from "fork" to "spawn", it no longer hangs.

  3. Changing the Pandas version to 1.3.0, 1.2.0, 1.1.0 or 1.0.0 does not help.

Questions:

  1. Is there a better way of changing the start method than _force_start_method()? I got that piece of code from this SO answer. Given the leading underscore, I assume it's not intended as part of the API?
  2. Is there anything I can do to help expedite a fix here, either in terms of development, or testing an existing solution?

@mmckerns
Copy link
Member Author

Changes were made to python's multiprocessing module that were detrimental to pickling on a mac in a spawn context. So, the "easy fix" was to switch to a fork context on a mac until the pickling can be improved in a spawn context on a mac. Unfortunately, it can hang in certain cases. This needs to be fixed in multiprocess, or possibly, in dill.

@ephe-meral
Copy link

I had the same issue as reported by @nickodell but using the Pool class.

Since I didn't want to edit an automatically installed package, I switched to the standard lib "multiprocessing" for now, since the interface works the same for my use case:

#from multiprocess import Pool
from multiprocessing import Pool

#...
with Pool(10) as p:
    result = list(p.imap(test_fun, fun_inputs)) 
#...

(I realize that this might not help someone who needs this fork specifically, but it seems to work fine with the above in a Jupyter notebook)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants