Multiprocessing and tqdm in jupyter notebooks. #485

vyasr · 2017-12-06T20:53:38Z

I'm trying to use tqdm along with multiprocessing.Pool in a notebook, and it doesn't quite seem to render correctly. The general problem appears to be well documented in Issue #407 and Issue #329, but neither of the fixes appear to have percolated to the notebook code. In particular, the "canonical example" in Issue #407 works for me on the command line, but when I move to a jupyter notebook and replace from tqdm import tqdm with from tqdm import tqdm_notebook as tqdm I get something like the following.

Changing the number of workers in the Pool yields different results, but the results are consistent (e.g. running with Pool(4) always shows progressers 1, 6, and 7).

During times when no progress bar is updating, I see the following error message repeatedly appearing on the terminal where the jupyter notebook is running.

[IPKernelApp] WARNING | WARNING: attempted to send message from fork
{'metadata': {}, 'msg_id': 'b58be6b1-80db-4a2d-9699-b3a4f3ddae8e', 'header': {'username': 'vramasub', 'version': '5.0', 'session': '667381c2-d872-4fc2-b2b5-06e33
957b17bb', 'msg_id': 'b58be6b1-80db-4a2d-9699-b3a4f3ddae8e', 'msg_type': 'comm_msg', 'date': datetime.datetime(2017, 12, 6, 15, 44, 13, 453842)}, 'content': {'dd
ata': {'state': {'value': '100% 5000/5000 [00:05<00:00, 877.55it/s]'}, 'buffers': [], 'method': 'update'}, 'comm_id': '486f6112cd1c496b95c4b25670f73786'}, 'mm
sg_type': 'comm_msg', 'parent_header': {'username': 'username', 'msg_id': 'FF1406FB5DC044BC98FFEDFF244B9E13', 'session': 'E79AAA8C69DA4475BAACFAD1D0EC9DAF', 'vee
rsion': '5.0', 'msg_type': 'execute_request', 'date': datetime.datetime(2017, 12, 6, 15, 43, 39, 135203)}}

For reference, this is the same issue noted here, a different project that is making use of tqdm internally. The terminal printing stops when progressbars are successfully printing in the notebook.

I have tested with both tqdm version 4.19.4 (the current version on pip) and the current master (installed using pip install -e git+https://github.com/tqdm/tqdm.git@master#egg=tqdm). I have tested on both Linux (4.9.34-gentoo) and OS X (High Sierra 10.13.1). My jupyter version is 4.4.0 both on Linux and on OS X.

The text was updated successfully, but these errors were encountered:

lrq3000 · 2018-01-03T22:23:17Z

I'm not sure what the culprit is but parallel bars are quite tricky. On Linux, it is usually transparent because tqdm can provide a lock by default, but that's not the case on Windows, the user must define one in the parent app and then provide it to tqdm. In your case, using Jupyter instead of the Python interpreter, maybe the "default lock" does not exist.

What you can try is to use a lock you define in your Jupyter cell, and provide it to tqdm, if this fixes the issue then this would confirm my hypothesis above.

wmayner · 2019-01-24T20:27:06Z

Can confirm that the same problem arises when explicitly providing a lock, using the Windows-supported code from tqmd/examples/parallel_bars.py:

from __future__ import print_function
from time import sleep
from tqdm import tqdm_notebook, trange
from multiprocessing import Pool, freeze_support, RLock


L = list(range(9))


def progresser(n):
    interval = 0.001 / (len(L) - n + 2)
    total = 5000
    text = "#{}, est. {:<04.2}s".format(n, interval * total)
    # NB: ensure position>0 to prevent printing '\n' on completion.
    # `tqdm` can't autmoate this since this thread
    # may not know about other bars in other threads #477.
    for _ in tqdm_notebook(range(total), desc=text, position=n + 1):
        sleep(interval)


if __name__ == '__main__':
    freeze_support()  # for Windows support
    p = Pool(len(L),
             initializer=tqdm.set_lock,
             initargs=(RLock(),))  # Provide a lock explicitly
    p.map(progresser, L)
    print('\n' * len(L))

casperdcl · 2019-01-24T20:48:01Z

Is this related to https://stackoverflow.com/questions/37103243/multiprocessing-pool-in-jupyter-notebook-works-on-linux-but-not-windows ?

wmayner · 2019-01-25T00:29:20Z

I don't think so; I ran the above code on Linux (in a Jupyter notebook).

nalepae · 2019-03-15T15:47:53Z

I found a strange hack to workaround your issue:
Just add a print statement in the progresser function.

Example here:

from time import sleep
from tqdm import tqdm_notebook as tqdm
from multiprocessing import Pool, freeze_support

def progresser(n):
    # This line is the strange hack
    print(' ', end='', flush=True)

    text = "progresser #{}".format(n)
    for i in tqdm(range(5000), desc=text, position=n):
         sleep(0.001)
        
if __name__ == '__main__':
    freeze_support()
    L = list(range(10))
    print()
    Pool(2).map(progresser, L)

Actually, you can put any print you want, but you have to print something:
print() or print('') won't work.

The line I added is the line I found which works and which modifies as few as possible the output.

And here is my output:

My configuration:
Linux Ubuntu 16.04, running on Chrome.

codejedi365 · 2019-09-30T03:51:14Z

I can't believe that hack works but it does for jupyter! Wish I had found this thread 6 hours ago!
Thanks @nalepae !

nunesgh · 2019-11-01T10:06:47Z

I'm sorry if I'm off topic here, but is it possible to have just one progress bar that accounts for the execution of all of the workers?

For example, my pool would have more than 65000 workers and I would like to have my progress bar updated every time a worker finishes executing. Unfortunately, each worker creates its own progress bar on top of the other and what I see is a progress bar stuck on 1/65000.

casperdcl · 2020-04-29T13:30:38Z

btw check out tqdm.contrib.concurrent (process_map and thread_map)

imneonizer · 2020-12-15T11:52:44Z

I'm sorry if I'm off topic here, but is it possible to have just one progress bar that accounts for the execution of all of the workers?

For example, my pool would have more than 65000 workers and I would like to have my progress bar updated every time a worker finishes executing. Unfortunately, each worker creates its own progress bar on top of the other and what I see is a progress bar stuck on 1/65000.

Use interprocess communication to collect progress from all process and show i in the main progressbar.
I achieved this using redis, in each process i create a hash of input data and send it to redis,
on progressbar side i get all the progress by calculating same hash for input data and collecting their current status and summing them up to form final progress status.

Berowne · 2021-07-08T08:32:40Z

I found a strange hack to workaround your issue:
Just add a print statement in the progresser function.

...

def progresser(n):
# This line is the strange hack
print(' ', end='', flush=True)
....

Excellent hack thanks!

chengs · 2021-11-17T03:03:54Z

Seems position is ignored now.
#1133

Summary: When implementing parallel inference in D34574082 (facebookresearch@dc066af), we added a hack to fix the issue where [Jupyter fails to render progress bar from a subprocess](tqdm/tqdm#485) by flushing `stdout` with a space for each chain of inference. Thinking that printing an extra space wouldn't be too bad in general, I didn't set a condition on when to run the snippet. However, it turns out that when using a non-standard stdout (e.g. within VSCode Jupyter plugin), this single line of `print` can leads to [a ton of empty output](https://app.reviewnb.com/facebookresearch/beanmachine/pull/1376/discussion/). While I haven't figured out what causes the issue and whether there's better alternative to fix the progress bar for Jupyter, one thing we can do for now is to only run the hacky snippet when necessary -- i.e. when a chain is being run in a subprocess, and within Jupyter notebook. Differential Revision: D34841233 fbshipit-source-id: 9aec4d4f6e5dcb213b9d0ed47275932e7710f7bc

Summary: Pull Request resolved: #1383 When implementing parallel inference in D34574082 (dc066af), we added a hack to fix the issue where [Jupyter fails to render progress bar from a subprocess](tqdm/tqdm#485) by flushing `stdout` with a space for each chain of inference. Thinking that printing an extra space wouldn't be too bad in general, I didn't set a condition on when to run the snippet. However, it turns out that when using a non-standard stdout (e.g. within VSCode Jupyter plugin), this single line of `print` can leads to [a ton of empty output](https://app.reviewnb.com/facebookresearch/beanmachine/pull/1376/discussion/). While I haven't figured out what causes the issue and whether there's better alternative to fix the progress bar for Jupyter, one thing we can do for now is to only run the hacky snippet when necessary -- i.e. when a chain is being run in a subprocess, and within Jupyter notebook. Reviewed By: jpchen Differential Revision: D34841233 fbshipit-source-id: 5b97cef298f7a451ac117c51a21ceb7eadcaa84d

casperdcl assigned lrq3000 Dec 8, 2017

casperdcl added p2-bug-warning ⚠ Visual output bad submodule-notebook 📓 Much web such IDE labels Dec 8, 2017

casperdcl assigned chengs Feb 26, 2019

casperdcl added the help wanted 🙏 We need you (discussion or implementation) label Feb 26, 2019

manycoding mentioned this issue Mar 18, 2019

Ugly progress bar if using Pool while downloading items scrapinghub/arche#1

Open

casperdcl added the synchronisation ⇶ Multi-thread/processing label Nov 22, 2019

andytaylor823 mentioned this issue Feb 26, 2021

Issues with Notebook + multiprocessing #1133

Open

mnmelo mentioned this issue May 24, 2021

AnalysisBase ProgressBar won't display under Jupyter when multiprocessing MDAnalysis/mdanalysis#3335

Open

mludv mentioned this issue Jul 13, 2021

Progress bars are not properly rendered in Jupyter notebook huggingface/datasets#2630

Closed

horizon-blue mentioned this issue Mar 24, 2022

Remove unnecessary flushing for tqdm progress bar facebookresearch/beanmachine#1383

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing and tqdm in jupyter notebooks. #485

Multiprocessing and tqdm in jupyter notebooks. #485

vyasr commented Dec 6, 2017

lrq3000 commented Jan 3, 2018 •

edited

wmayner commented Jan 24, 2019

casperdcl commented Jan 24, 2019

wmayner commented Jan 25, 2019 •

edited

nalepae commented Mar 15, 2019 •

edited

codejedi365 commented Sep 30, 2019

nunesgh commented Nov 1, 2019

casperdcl commented Apr 29, 2020

imneonizer commented Dec 15, 2020

Berowne commented Jul 8, 2021

chengs commented Nov 17, 2021 •

edited

Multiprocessing and tqdm in jupyter notebooks. #485

Multiprocessing and tqdm in jupyter notebooks. #485

Comments

vyasr commented Dec 6, 2017

lrq3000 commented Jan 3, 2018 • edited

wmayner commented Jan 24, 2019

casperdcl commented Jan 24, 2019

wmayner commented Jan 25, 2019 • edited

nalepae commented Mar 15, 2019 • edited

codejedi365 commented Sep 30, 2019

nunesgh commented Nov 1, 2019

casperdcl commented Apr 29, 2020

imneonizer commented Dec 15, 2020

Berowne commented Jul 8, 2021

chengs commented Nov 17, 2021 • edited

lrq3000 commented Jan 3, 2018 •

edited

wmayner commented Jan 25, 2019 •

edited

nalepae commented Mar 15, 2019 •

edited

chengs commented Nov 17, 2021 •

edited