[Regression] Tqdm freezes when iteration speed changes drastically #54

BertrandBordage · 2015-11-16T18:51:51Z

That’s a regression that happened between version 1.0 and 2.0.
Try this minimal example with several tqdm versions:

from tqdm import tqdm
from time import sleep

for i in tqdm(range(10000)):
    if i > 3000:
        sleep(1)

As you’ll see, it works great with tqdm 1.0, but it takes almost forever to update with version 2.0 and 3.0. In fact, it takes about 3000 seconds to update, because the first update did 3000 updates in one second, so tqdm assumes it’s ok to wait for 3000 more updates before updating. But that’s only an acceptable behaviour for loops with a rather constant iteration speed. The smoothing argument was developed for such cases (see #48, great work on this by the way!), but the current miniters/mininterval behaviour contradicts it.

Of course, specifying miniters=1 does the expected behaviour, so I’m wondering: why not do that by default?

In tqdm 1.0, miniters was set to 1 and mininterval was set to 0.5, which meant: "we update the display at every iteration if the time taken by the iteration is longer than 0.5 seconds, otherwise we wait several iterations, enough to make at least a 0.5 second interval".

Since tqdm 2.0, miniters is set to None and mininterval is set to 0.1. From what I understand, it means "we update the display after waiting for several iterations, enough to make at least a 0.1 second interval".

Unfortunately, from what the example above shows, tqdm doesn’t respect this rule since we don’t have an update every 0.1 second. The behaviour seems more complex now, it tries to assume how much time tqdm should wait instead of basically counting when it should update its display, like it was in tqdm 1.0.

The text was updated successfully, but these errors were encountered:

casperdcl · 2015-11-16T19:00:36Z

Yes, I'm aware of this and working on a fix. It's good to still have a dynamic miniters (we don't want to check mininterval every iteration because that would imply a call to time() every iteration, which is rubbish for overhead.) My idea is to decrease miniters dynamically (as well increase dynamically like in the current version)

BertrandBordage · 2015-11-16T19:02:39Z

Additional thoughts:

Why not remove miniters? This parameter seems useless, I really can’t imagine a case where one would use it instead of mininterval. Even mininterval is not really useful, it’s cool having it and we should keep it, but it’s unlikely that someone wants to see a slower or faster update rate than 0.1 s.
mininterval, like ncols (and miniters if we keep it), are not PEP8 names, and could be easier to understand. I suggest refresh_time for mininterval and width for ncols (the number of columns being the unit, not what it means to users).

casperdcl · 2015-11-16T19:07:53Z

I never really intended miniters or mininterval for end-users to make use of... like I said, they are really internal parameters for the sake of optimisation

BertrandBordage · 2015-11-16T19:13:04Z

Ah didn’t realize it was a way to prevent from executing time() too much.
That makes the problem more complex, indeed.
So if the final problem is optimization, the solution is simply to write tqdm in Cython instead of Python. It will be way faster than it currently is, and executing time directly from C will take a ridiculously small time. Function and method calls in Python are way slower than getting a timestamp in C.

If needed, I can help on this part, as I’ve been using Cython for years and I’m specialized in Python/Cython/algorithm optimisation.

casperdcl · 2015-11-16T19:29:55Z

We've tested... getting the timestamp is by far the slowest thing. I only use cython for my local installs, it doesn't help tqdm significantly.

casperdcl · 2015-11-16T19:32:01Z

Unless you're talking about a different way of getting the time that is only possible in pure cython/C ?

lrq3000 · 2015-11-16T20:06:39Z

About mininterval being an internal parameter, I disagree. I modify it for time sensitive applications to minimize the overhead of tqdm significantly, for example by setting it to 1s or even more. This is very useful to keep a progress bar display while having virtually no overhead. Also, it's a very simple parameter, very easy for the end user to change and to understand the impact (actually, I think it's currently the easiest way to minimize tqdm's overhead).

About Cython implementation, I think that's useless, for one reason: PyPy. tqdm is fully compatible with PyPy, and usually PyPy gets as fast as Cython or even more in some applications. In the case of tqdm, I can't see what Cython optimization could do more than PyPy can't already optimize automatically. Cython is nowadays mainly useful for vectorized operations since it can work on numpy, but since we don't do any fancy mathematical operation, I think PyPy already optimizes all tqdm operations to the maximum possible.

About the present issue, I think that the previous tqdm behavior should be restored, it was easier to understand even if it incurs a performance hit. However, if @casperdcl can find a better fix, that would ofc be better :)

BertrandBordage · 2015-11-20T11:03:09Z

@lrq3000 I agree with you, the previous behaviour should be restored, tqdm will become a bit slower (for a total of about 120 to 200 ns per non-display iteration), but it’s only a small performance issue that’ll not really affect most of the cases. In case of a real rare performance problem, people should just not use tqdm.

But I don’t totally agree with you about PyPy. Yes, Pypy could make this problem disappear, but maybe not as much as a Cython implementation. For me, there is two advantages to using Cython: it makes a library that works on CPython as well as PyPy, and it allows a direct access to C functions. PyPy can optimise by guessing types and compiling on-the-fly, but it can’t optimise internal functions functions of Python (like time.time) AFAIK. And that’s what will mainly block us.

Now, is it really worth writing tqdm in Cython? I don’t think so. Even with Cython, the system calls like getting a timestamp, writing to stderr, or getting the terminal width will still take a vast amount of time.

We could also make a poll/vote to know if users would mind having a 120 ns cost per iteration instead of 60 ns.

lrq3000 · 2015-11-21T12:47:59Z

About Cython vs PyPy, I doubt Cython will do a better optimization about type inference and stuff like that, so that's not where we will gain some performance. About system calls, if we can have some faster access using Cython, then yes there may be some significant gain, but I'm not sure about that, as I guess that Python is already doing system calls such as time using C, so using Cython will probably give us no gain at all.

About reverting to the previous behavior, you can check the alternative fix proposed by @casperdcl here, which preserve the current performances (theoretically): #55

I will soon review it (sorry about that, I should have reviewed it by now but I'm a bit too much busy, it will be done max on monday evening :S)

casperdcl · 2015-11-21T21:45:36Z

Just tried pypy. Wonderful stuff. I can confirm tqdm's pure python code runs faster with pypy than cython-compiled (may not be true for pure cython re-implementation). Anyway I think we're all agreed that we're definitely going to stick with pure python for this project.

fixes #54

lrq3000 · 2015-11-23T23:57:49Z

@BertrandBordage please check if the latest version fixes your issue, you can play with the smoothing parameter to your desire. If not, we'll think something else out :)

casperdcl · 2015-11-24T00:01:08Z

Interesting that it says I closed this issue even though I didn't :)

BertrandBordage · 2015-11-24T00:32:17Z

Sorry to say that, but it still freezes on the minimal example from the original post :
(Using tqdm 3.1.0 and Python 3.5)
Without changing miniters, the only workaround is to change smoothing to 0 or a really small value, like 1e-5.

@casperdcl Of course, the idea of a Cython implementation is not to just compile the Python source ;) You only win an average 30% performance by doing that. If you carefully code the Cython implementation, a lot of programs become several magnitude order faster depending on what your program uses (I even saw a 5000× speedup on a real use case once!). In tqdm’s case, the speedup would be small, maybe 2-3 times faster, but nothing groundbreaking, because most of the work is system calls.

casperdcl · 2015-11-24T00:37:07Z

The example you gave (3k near instantaneous iters then a long pause) is so wildly odd that it should require you to specify a lower miniters manually (set it to 1 to recover original behaviour). I've commented in the documentation that erratic rates (eg downloading over internet) should have miniters=1

BertrandBordage · 2015-11-24T10:10:51Z

That’s right, but even with a realistic change of speed, the problem stays the same:

for i in tqdm(range(10000)):
    if i < 3000:
        sleep(1e-3)
    else:
        sleep(1)

Waiting 1 ms vs 1 s is realistic, when you check that an image exists otherwise convert it, you get waits of these type.

That’s great of you to have added a note in the documentation, but to me that’s still a regression from 1.0. Sorry to bother you :s

casperdcl · 2015-11-24T10:25:51Z

Not a bother, it's just that if we make miniters=1 the default (restoring old behaviour) the performance hit for a bunch of other cases becomes noticeable. Of course if enough people think the use case you describe is more common than quick iterations, we can revert.

BertrandBordage · 2015-11-24T10:27:42Z

OK!

lrq3000 · 2015-11-24T20:47:18Z

I reopen the issue because there's a bug, the exponential moving average is not behaving like it should be: normally, miniters should converge towards 1, whatever was the initial big step iterations. For example using this code:

from tqdm import tqdm
from time import sleep

t = tqdm(total=10000)
t.update(1000)
for i in range(10000-1000):
    sleep(0.01)
    t.update()
t.close()

What we get is this sequence of iterations updates: [0, 1010, 1313, 1616, 1919, etc], you see that after the first smoothing reducing miniters=1000 to miniters=303, it then does nothing at all, while it should continually reduce miniters over time...

So this clearly needs fixing, plus I have an idea to fix this kind of issue for good: adding a maxinterval argument to automatically reduce miniters if it's too high that it takes longer than maxinterval to update the bar.

lrq3000 · 2015-11-24T22:45:43Z

Ah and I forgot to say that I agree with @BertrandBordage that this use case should be fixed right now: as I've read and referenced in #48, it's quite easy to make progress bars for constant rate tasks, but it becomes interesting when you face variable rate tasks, and this is quite common for lots of applications (the best example being downloading over internet). So yes, we should fix that, but @casperdcl patch should be doing the trick, it seems there's only a small glitch somewhere to fix.

casperdcl · 2015-11-25T00:46:08Z

Right. All fixed and merged and tagged and... ready to push v3.1.1?

casperdcl · 2015-11-25T00:51:31Z

btw the sequence of updates in the case you described @lrq3000 is now [0, 1010, 1308, 1520, 1671, 1780, 1859, 1962, 2079, 2163, ...] on my machine...

lrq3000 · 2015-11-25T18:55:38Z

Wow @casperdcl, that was fast! I intended to try to fix it by myself today, but your fix is anyway a lot better and more elegant than what I thought 👍

I'm trying this now.

About maxinterval, my idea was to use it to do the following:

if delta_t > maxinterval:
    miniters = maxinterval * miniters / delta_t

After one too long update, this should effectively prevent any subsequent overflow of the time to update beyond maxinterval. In addition, this wouldn't cost much but an additional if statement. Do you think I should add this @casperdcl ?

casperdcl · 2015-11-25T20:06:17Z

Yeah, @lrq3000 sounds good. Maybe with a default value of 1 second or something...

lrq3000 · 2015-11-25T22:05:24Z

@casperdcl I thought about a default value of 60 seconds, we don't want to force anyone on updating too often the bar by default I guess.

About your latest code change, could you please explain why did you use mininterval / delta_t ? I know that it's to weight delta_it influence on miniters depending on the time spent for the last update, but why this ratio? The issue I can see is when delta_t << mininterval, in this case, delta_it influence will be multiplied by an integer >> 1. Was this on purpose?

lrq3000 · 2015-11-25T22:06:25Z

Finally I think I'll put the default at 10 seconds, I guess it's a good compromise

casperdcl · 2015-11-25T22:26:54Z

@lrq3000 yes to both your comments... when dynamic_miniters is true, my current implementation uses ema to set miniters to a values which corresponds as closely as possible to mininterval... this includes both increasing as well as decreasing.

About maxinterval I don't think 1s is too much as the overhead would be negligible for such loops - but 10s is ok too I suppose

lrq3000 · 2015-11-25T22:57:30Z

@casperdcl about maxinterval, I know, the issue isn't about the overhead would be negligible as you, but just I think we should not constraint the user too much: by setting 1, if the user wants to set a mininterval higher than 1s, he will also have to raise maxinterval. I guess that likely most users won't want to set a mininterval higher than 10s so that's why I propose 10s. Anyway, we can set this for now and see in the future if there's any issue :)
I'm working now on how to test this new feature lol...

About smoothing dynamic_miniters, ok that's what I thought, you tried to make miniters converge towards mininterval. That's a very nice idea :) Ok I've done some more tests, everything's OK. The way it works may seem a bit counterintuitive at first but it works correctly as we expect. Great job 👍

casperdcl · 2015-11-25T23:07:21Z

np. thanks @BertrandBordage for the suggestions!

lrq3000 · 2015-11-28T02:05:44Z

Ok, done with maxinterval, and I added all relevant unit tests (I hope they will be stable enough, since we need time.sleep() to test them it was quite a bit tricky to design those tests...).

lrq3000 · 2015-11-28T02:49:37Z

Well, the test test_smoothed_dynamic_min_iters_with_min_interval() sometimes fails on Travis on pypy for the iteration-based tqdm test... So I have commented the asserts, we'll need to find a more reliable way to do that test...

casperdcl · 2015-11-28T08:50:07Z

@lrq3000 this really should've been done on a branch and PR, nor directly on master. Now the commit history looks ugly and includes tagged releases which may be broken... And since they're tagged it's not a good idea to rebase and remove them.

As a general rule of thumb I only push to master if it's a small one-commit change that isn't likely to break anything and I don't tag until it definitely passes on Travis and all my systems. Also I tend to merge (not rebase) PRs into master then ammend to include the version bump in the merge commit. I think this is best to track which branches things came from etc... let me know if you guys don't agree.

casperdcl · 2015-11-28T08:51:17Z

Thanks for the implementation and trying to write the tests, though!

lrq3000 · 2015-11-28T11:19:17Z

@casperdcl Yes sorry, you're totally right, I don't know why but I thought this would be a one-commit change and that it was just small change from the last PR. I hadn't foreseen it would be such a mess to get working right with unit tests...

I will now be a lot more conservative, I will branch for anything but very small "cosmetic" changes.

About merging PRs into master, yes that's very nice indeed to see where they come from. How do you do that? Do you use Github's interface or commandline?

lrq3000 · 2015-11-28T11:20:31Z

@casperdcl Also can I ask you what "rules" you usually use to know when to merge which commits together? (this will help me better know how to manage my commits, I'm not experienced working in teams with git :) )

lrq3000 · 2015-11-28T13:55:21Z

Ok I fixed the issue with the iteration-based tqdm test, all should be ok now. I have uploaded the new version 3.1.4 on pypi.

casperdcl · 2015-11-28T15:47:49Z

Well I never use git's automated merge because then i'd have to pull locally, amend and then force-push, which is not great...

git-checkout master (locally)
git-fetch --all
git-pull
git-merge new_feature_branch
Update _version.py
git-add _version.py
git-commit --amend (add 'bump version' to the 'merge' message)
Test
git-push
Wait for Travis to be safe
If fails, fix locally (preferably amending commits rather than making new ones... or if you commit new ones do a little rebase) then continue from step (8)... force-push at step (9)...
git-tag vx.x.x
push --tags

Optional:
13. delete branch from the PR page
14. git-branch -D new_feature_branch (locally delete branch)

General rule: once you push a tag to git (or create the tag online) you can no longer alter/mess with/rebase anything from that tag and older.

If all this is too complex just wait for someone else to do it. It's good practice not to merge your own code anyway (because that ensures someone else looks at it). This was a longer message than I thought it was going to be.

Sent from my mobile device

casperdcl · 2015-11-28T15:53:23Z

Like I said I'm happy to do all of the above if it's too complex. Can I ask what are the steps you follow for uploading to pypi? I know there are rst issues (links etc?) and dependencies for building windows binaries...

lrq3000 · 2015-11-28T16:04:23Z

Oh, thank you very much @casperdcl for detailing all your steps to merge a branch 👍 I know, I won't merge by myself my own branches (I did for the latest branch but it's because it was a simple bugfix), but it's interesting to know how it's done :)

About uploading to pypi, you just need to do python setup.py make buildupload, it will run the readme checks, compile and then ask you the credentials to upload to pypi (it will ask you twice: first to upload the readme, and then to upload the builds).

For Windows, there's no dependency except that you need to run Windows :/ But anyway that's not a big deal if the win build is not done, since this is a very simple library with no dependency, there's no issue for anyone to install from the .zip.

lrq3000 · 2015-11-28T16:07:35Z

PS: a tip when building: you always need to remove the folders dist, build and the file tqdm.egg-info before launching a build (this is done in the Makefile), else your build may not include the latest files changes (there is a record inside the egg) and you may upload older versions of tqdm along the new one (because when you will upload the builds using twine, it will upload everything inside the dist folder!).

casperdcl · 2015-11-28T16:10:36Z

I think it's nice to upload to pypi with windows binaries... are you saying this is only possible while running windows? No cross-compilation possible?

lrq3000 · 2015-11-28T16:25:10Z

To my knowledge, no, it's not possible. To compile Windows binaries, you need the Python win32 interpreter to work, which is compiled with Visual Studio, so you cannot make it work on a Linux platform. From what I've read, people that do cross-compilations just install Windows in a VM and make the builds inside (repeat the process if you want to compile MacOSX builds...).

If someday Python switch to MinGW or cygwin gcc to compile Python, maybe it will be possible to build win32 Python apps on Linux, but for now it seems that's not possible.

lrq3000 · 2015-11-28T16:26:43Z

But anyway you can try it for yourself if you're running Linux, you just need to run this command:

python setup.py sdist bdist_wininst

On Windows, it generates a win32 installer without requiring anything but native Python.

lrq3000 mentioned this issue Nov 17, 2015

fixes #54 #55

Merged

casperdcl closed this as completed in 5c64657 Nov 23, 2015

lrq3000 added a commit that referenced this issue Nov 23, 2015

Merge pull request #55 from tqdm/dynamic_miniters

ca1c408

fixes #54

lrq3000 reopened this Nov 24, 2015

casperdcl closed this as completed in 212f67b Nov 24, 2015

aplavin mentioned this issue Aug 27, 2016

tqdm hangs up with highly irregular iterations speed #249

Closed

lrq3000 mentioned this issue Aug 28, 2016

Monitoring thread to prevent tqdm taking too much time for display #251

Merged

8 tasks

cwfparsonson mentioned this issue Apr 20, 2021

TQDM bar freezing script with multiprocessing #1160

Open

[Regression] Tqdm freezes when iteration speed changes drastically #54

[Regression] Tqdm freezes when iteration speed changes drastically #54

Comments

BertrandBordage commented Nov 16, 2015

casperdcl commented Nov 16, 2015

BertrandBordage commented Nov 16, 2015

casperdcl commented Nov 16, 2015

BertrandBordage commented Nov 16, 2015

casperdcl commented Nov 16, 2015

casperdcl commented Nov 16, 2015

lrq3000 commented Nov 16, 2015

BertrandBordage commented Nov 20, 2015

lrq3000 commented Nov 21, 2015

casperdcl commented Nov 21, 2015

lrq3000 commented Nov 23, 2015

casperdcl commented Nov 24, 2015

BertrandBordage commented Nov 24, 2015

casperdcl commented Nov 24, 2015

BertrandBordage commented Nov 24, 2015

casperdcl commented Nov 24, 2015

BertrandBordage commented Nov 24, 2015

lrq3000 commented Nov 24, 2015

lrq3000 commented Nov 24, 2015

casperdcl commented Nov 25, 2015

casperdcl commented Nov 25, 2015

lrq3000 commented Nov 25, 2015

casperdcl commented Nov 25, 2015

lrq3000 commented Nov 25, 2015

lrq3000 commented Nov 25, 2015

casperdcl commented Nov 25, 2015

lrq3000 commented Nov 25, 2015

casperdcl commented Nov 25, 2015

lrq3000 commented Nov 28, 2015

lrq3000 commented Nov 28, 2015

casperdcl commented Nov 28, 2015

casperdcl commented Nov 28, 2015

lrq3000 commented Nov 28, 2015

lrq3000 commented Nov 28, 2015

lrq3000 commented Nov 28, 2015

casperdcl commented Nov 28, 2015

casperdcl commented Nov 28, 2015

lrq3000 commented Nov 28, 2015

lrq3000 commented Nov 28, 2015

casperdcl commented Nov 28, 2015

lrq3000 commented Nov 28, 2015

lrq3000 commented Nov 28, 2015