Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix parallel bars printing race issue by using locks #291

Closed
wants to merge 1 commit into from

Conversation

lrq3000
Copy link
Member

@lrq3000 lrq3000 commented Oct 16, 2016

Should fix #285 by using locks. There is one lock for multiprocessing and one for threading, so this should fix the issue in any case.

Canonical example:

from time import sleep
from tqdm import tqdm
from multiprocessing import Pool, freeze_support

def progresser(n):         
    text = "progresser #{}".format(n)
    for i in tqdm(range(5000), desc=text, position=n):
        sleep(0.001)

if __name__ == '__main__':
    freeze_support()  # for Windows support
    L = list(range(10))
    Pool(len(L)).map(progresser, L)

Note however that this doesn't fully fix issues on Windows when using too many bars at once, try with 3 bars and it works, try with 10 and it doesn't, because Windows console has a tendency to do weird newlines when there are too many lines printed at once (see also this PR that is suffering from the same issue). I can maybe also try on Windows using MSYS2 as suggested here to fix the issue?

TODO:

  • Unit test (using squash_ctrl and a fake IO, just check that there are 10 lines at the end, each with a different number, just like the example case provided).
  • Flake8

Signed-off-by: Stephen L. <lrq3000@gmail.com>
@codecov-io
Copy link

codecov-io commented Oct 16, 2016

Current coverage is 90.74% (diff: 100%)

Merging #291 into master will decrease coverage by 1.03%

@@             master       #291   diff @@
==========================================
  Files             7          7          
  Lines           535        497    -38   
  Methods           0          0          
  Messages          0          0          
  Branches         97         88     -9   
==========================================
- Hits            491        451    -40   
- Misses           43         44     +1   
- Partials          1          2     +1   

Powered by Codecov. Last update a65e347...1d397a1

Copy link
Contributor

@CrazyPython CrazyPython left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably not qualified to do this, but it looks OK to me.

@@ -734,6 +752,10 @@ def __iter__(self):
else smoothing * delta_t / delta_it + \
(1 - smoothing) * avg_time

# Acquire locks if parallel bars
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

755-758 & 848-851 duplicate code

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't work around this @CrazyPython, __iter__() is a duplication of update() for performance maximization.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 what about __iter__ = update, or the other way around?

Copy link
Member Author

@lrq3000 lrq3000 Oct 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried everything we could imagine, including what you propose, and it lowers drastically the performance of __iter__().

I think there is no way around, __iter__() is super efficient because once it's launched it works in its own virtual routine with minimal calls to external functions (see how most self variables are stored in method's local variables before the for loop), so any attempt to unify __iter__ and update is bound to fail IMO (except maybe if you're a Python wizard and you dynamically rewrite the method on first call depending on whether it's __iter__ that was called or not, but then I think it would be way more complex to maintain than simple code duplication).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000

virtual routine with minimal calls to external functions

mind blown. Where'd you learn this much about python internals?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 I believe you. But what specific places can I personally learn more?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CrazyPython Ah this is not easy to tell you... My current knowledge is a mix of several things, but mainly three sources: my own desire to program tools that can be useful to me, internet (with great resources such as StackOverflow!) to help get past issues or get to know more tricks (blog posts are also good), and university courses for theoretical stuff (like concurrent programming, this is not something you can usually learn by yourself on a fun spare-time project). And I am still learning almost everyday.

But really, the most important thing I think is just to try to make your own projects, that are useful to you. This is the only way to be willing to spend hours and hours fixing problems and hitting walls: the reward of having a software that will save you time later on, or enable you to do something that was impossible to do manually. This is nowadays called the project-based approach to learning, but it's really not new and is the same as the old saying "practice makes perfect".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So just find a purpose to make a program of your own, and continue. The goal is not to learn to program, but simply to achieve a program that works for your needs. In the end, you will progress a lot. And this works at any proficiency level, I am still learning everyday in the languages I master the most.

Copy link
Contributor

@CrazyPython CrazyPython Oct 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 Ah yes, I follow this myself. Project-based learning is what I tell others. Nice to see someone else shares my opinion. :)

But I'm asking specifically: how did you learn about Python internals, specifically? trial and error? profiling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah then it's also the same way: i had University courses on compilers,
which apply to pretty much any language even if python generates bytecodes
instead of a binary, then i also did my own projects like profiling as you
said (see my easy-profilers repo) and benchmarking for tqdm (see the
bytecode analysis PR), and finally a lot of SO and blog posts, the most
useful being the PyPy blog, they even detail a tutorial to make your own
python compiler. There were also a few very interesting pure Python
optimizers with tutorials, i can dig them out if you are interested.

@lrq3000
Copy link
Member Author

lrq3000 commented Oct 16, 2016

@CrazyPython Thank you, but indeed @casperdcl enabled mandatory reviews, and only repo admins can do an official review. That is a good thing, I just hope @casperdcl will have some spare time soon because issues and PRs are piling up!

@LankyCyril
Copy link

LankyCyril commented Oct 19, 2016

I might be doing something wrong, but nothing has changed for me.
Here I show all the steps I'm taking in case I'm making a stupid mistake somewhere.

$ pyvenv-3.5 tqdm_test
$ source tqdm_test/bin/activate                                   
(tqdm_test) $ pip install --upgrade pip                           
Collecting pip                                                    
  Using cached pip-8.1.2-py2.py3-none-any.whl                     
Installing collected packages: pip                                
  Found existing installation: pip 7.1.2                          
    Uninstalling pip-7.1.2:                                       
      Successfully uninstalled pip-7.1.2                          
Successfully installed pip-8.1.2                                  
(tqdm_test) $ pip install -e git+https://github.com/tqdm/tqdm@1d39
7a118ae91242e03f44d8a0b012afe5fa21e7#egg=tqdm                     
Obtaining tqdm from git+https://github.com/tqdm/tqdm@1d397a118ae91
242e03f44d8a0b012afe5fa21e7#egg=tqdm                              
  Cloning https://github.com/tqdm/tqdm (to 1d397a118ae91242e03f44d
8a0b012afe5fa21e7) to ./tqdm_test/src/tqdm                        
  Could not find a tag or branch '1d397a118ae91242e03f44d8a0b012af
e5fa21e7', assuming commit.                                       
Installing collected packages: tqdm                               
  Running setup.py develop for tqdm                               
Successfully installed tqdm                                       
(tqdm_test) $ python parallels.py                                 
progresser #0:   0%|                     | 0/5000 [00:00<?, ?it/s]
                                                                 (
tqdm_test) $                                                      


progresser #2:   0%|                     | 0/5000 [00:00<?, ?it/s]




progresser #0: 100%|█████████| 5000/5000 [00:05<00:00, 921.46it/s]
progresser #3: 100%|█████████| 5000/5000 [00:05<00:00, 920.24it/s]

The only thing that I have modified from the code in #285 is I reduced the length of L to 5.

Edit: this time on CentOS 6.6, using Python 3.5.1.

@lrq3000
Copy link
Member Author

lrq3000 commented Oct 19, 2016

That's bad news. Can you try with 3 bars?

@lrq3000
Copy link
Member Author

lrq3000 commented Oct 19, 2016

I ask that because either this is still a racing condition and the locks do not work (which is VERY weird), either it's the console not able to manage more than 3 bars at once with our system (without curses) which would be very bad...

@lrq3000
Copy link
Member Author

lrq3000 commented Oct 19, 2016

Ok I just checked if tqdm can support lots of bars simultaneously, and yes it does:

from time import sleep
from tqdm import trange

tlist = []
for i in xrange(20):
    tlist.append(trange(5000, desc='bar'+str(i), position=i, leave=True))
for i in xrange(5000):
    for bar in tlist:
        bar.update()
    sleep(0.001)

So this means that it's really the locking mechanism that does not work. I have no idea why it does not work, this is a very weird issue...

@LankyCyril
Copy link

Still the same issue, with 3, and with 2.

I'm not that confident in using pip to install from pull requests -- maybe I'm not installing it correctly? Although I see the appropriate code changes in tqdm_test/src/tqdm/tqdm/_tqdm.py

@lrq3000
Copy link
Member Author

lrq3000 commented Oct 19, 2016

I think I got the issue: the Lock is meant to be created before instanciating the pool of parallel processes, and supplied to the pool as an argument. If it's created inside the class, it won't work.

That's bad, because we don't want the user to need to supply the lock, it should be totally automated and transparent. We need to know how other projects managed this, if any did.

@lrq3000
Copy link
Member Author

lrq3000 commented Oct 19, 2016

But from Google's own projects, it seems it's not possible:

class AtomicDict(object):
  """Thread-safe (and optionally process-safe) dictionary protected by a lock.
  If a multiprocessing.Manager is supplied on init, the dictionary is
  both process and thread safe. Otherwise, it is only thread-safe.
  """

  def __init__(self, manager=None):
    """Initializes the dict.
    Args:
      manager: multiprocessing.Manager instance (required for process safety).
    """
    if manager:
      self.lock = manager.Lock()
      self.dict = manager.dict()
    else:
      self.lock = threading.Lock()
      self.dict = {}

So two things:

  1. we should also make the cls._instances dict thread and multiprocess safe.
  2. the user needs to provide a manager or a lock for multiprocessing safety, thread safety can be managed by the tqdm class directly.

So for 2, maybe we should provide a helper function to ease the multiprocessing use?

@lrq3000
Copy link
Member Author

lrq3000 commented Oct 19, 2016

An alternative on pdfrankenstein project, which is very close to what we are trying to achieve (a progress bar printing on stdout).

Basically what they do is instanciate a multiprocessing lock as a global variable. This seems to work. But is it clean in our case?

PS: And good thing to know, Queue is not multiprocess safe because of implementation design.

Also good to know, flock can be used for cross-language locking and flock is only advisory so programs can bypass the locks!

And this implementation detail can be handy later, saving it here for reference (differences between Linux and Windows implementations of multiprocessing module).

And a whole host of other examples here for multiprocessing.Lock.

@lrq3000
Copy link
Member Author

lrq3000 commented Oct 25, 2016

Ah and python official doc and peps are useful too for precise details (but
rarely i use them at my first source when i need the bigger picture because
they often miss context and require prerequisite knowledge of previous
cpython implementation details).
Le 21 Oct. 2016 16:35, "Stephen LARROQUE" lrq3000@gmail.com a écrit :

Ah then it's also the same way: i had University courses on compilers,
which apply to pretty much any language even if python generates bytecodes
instead of a binary, then i also did my own projects like profiling as you
said (see my easy-profilers repo) and benchmarking for tqdm (see the
bytecode analysis PR), and finally a lot of SO and blog posts, the most
useful being the PyPy blog, they even detail a tutorial to make your own
python compiler. There were also a few very interesting pure Python
optimizers with tutorials, i can dig them out if you are interested.
Le 21 Oct. 2016 02:19, "CrazyPython" notifications@github.com a écrit :

@CrazyPython commented on this pull request.

In tqdm/_tqdm.py #291:

@@ -734,6 +752,10 @@ def iter(self):
else smoothing * delta_t / delta_it +
(1 - smoothing) * avg_time

  •                    # Acquire locks if parallel bars
    

@lrq3000 https://github.com/lrq3000 Ah yes, I follow this myself.
Project-based learning is what I tell others.

But I'm asking specifically: how did you learn about Python internals,
specifically? trial and error? profiling?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#291, or mute the thread
https://github.com/notifications/unsubscribe-auth/ABES3gssAC4PyqNbwdvxdPPt4NniUJQsks5q2AUmgaJpZM4KYAF1
.

@casperdcl casperdcl force-pushed the master branch 4 times, most recently from 8cade97 to a65e347 Compare October 31, 2016 02:34
@lrq3000 lrq3000 added to-fix ⌛ In progress and removed need-feedback 📢 We need your response (question) labels Oct 31, 2016
@lrq3000
Copy link
Member Author

lrq3000 commented Oct 31, 2016

Some clever SO folk (Alexey Smirnov) found out the culprit: a seemingly buggy management of global locks on Windows.

So I don't know what we do now. If there is no way around, we can provide a new kwarg for the user to provide a lock.

@lrq3000 lrq3000 added this to the v5.0.0 milestone Nov 14, 2016
@casperdcl casperdcl added p2-bug-warning ⚠ Visual output bad and removed p0-bug-critical ☢ Exception rasing labels Dec 6, 2016
@lrq3000
Copy link
Member Author

lrq3000 commented Dec 26, 2016

So there is no way around passing the lock from the parent to the children, simply because on Windows there is no fork but only spawn for multi processes, so the children cannot access parent's data. However on Linux, a global lock initialized in the parent would work, but since we aim for crossplatform compatibility, we need to find a workaround...

See for more details:

So what I propose:

  • provide a new kwarg write_lock=None so that the user can provide a thread or multiprocessing lock to tqdm at pool's initialization
  • for Linux, provide a global lock: if write_lock is None: write_lock = global_write_lock. Only issue is that there are two kinds of locks: thread and multiprocessing... But we can create a custom lock that merge both maybe.
  • add in the documentation how to use tqdm with locks on Windows (as it should be transparent on Linux)
  • add the sample code as a unit test with mp.set_start_method('spawn') to force mimicking Windows "no-fork" spawning of processes. Else it's impossible to test on Linux.

@lrq3000
Copy link
Member Author

lrq3000 commented Dec 28, 2016

Continued in #329.

@lrq3000 lrq3000 closed this Dec 28, 2016
@lrq3000 lrq3000 deleted the parallel-print-fix branch December 28, 2016 19:55
casperdcl added a commit that referenced this pull request Sep 26, 2017
fixes #285 -> #291 -> #329
fixes #422
fixes #439

fixes #323
fixes #324
fixes #334
fixes #407
fixes #418

related to:
- #97
- #143
- #331
- #361
- #384
- #385
- #417
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p2-bug-warning ⚠ Visual output bad to-fix ⌛ In progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Progress bars jump around with multiprocessing when using the position kwarg
5 participants