Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] Tqdm freezes when iteration speed changes drastically #54

Closed
BertrandBordage opened this issue Nov 16, 2015 · 42 comments
Closed

Comments

@BertrandBordage
Copy link

That’s a regression that happened between version 1.0 and 2.0.
Try this minimal example with several tqdm versions:

from tqdm import tqdm
from time import sleep

for i in tqdm(range(10000)):
    if i > 3000:
        sleep(1)

As you’ll see, it works great with tqdm 1.0, but it takes almost forever to update with version 2.0 and 3.0. In fact, it takes about 3000 seconds to update, because the first update did 3000 updates in one second, so tqdm assumes it’s ok to wait for 3000 more updates before updating. But that’s only an acceptable behaviour for loops with a rather constant iteration speed. The smoothing argument was developed for such cases (see #48, great work on this by the way!), but the current miniters/mininterval behaviour contradicts it.

Of course, specifying miniters=1 does the expected behaviour, so I’m wondering: why not do that by default?

In tqdm 1.0, miniters was set to 1 and mininterval was set to 0.5, which meant: "we update the display at every iteration if the time taken by the iteration is longer than 0.5 seconds, otherwise we wait several iterations, enough to make at least a 0.5 second interval".

Since tqdm 2.0, miniters is set to None and mininterval is set to 0.1. From what I understand, it means "we update the display after waiting for several iterations, enough to make at least a 0.1 second interval".

Unfortunately, from what the example above shows, tqdm doesn’t respect this rule since we don’t have an update every 0.1 second. The behaviour seems more complex now, it tries to assume how much time tqdm should wait instead of basically counting when it should update its display, like it was in tqdm 1.0.

@casperdcl
Copy link
Sponsor Member

Yes, I'm aware of this and working on a fix. It's good to still have a dynamic miniters (we don't want to check mininterval every iteration because that would imply a call to time() every iteration, which is rubbish for overhead.) My idea is to decrease miniters dynamically (as well increase dynamically like in the current version)

@BertrandBordage
Copy link
Author

Additional thoughts:

  • Why not remove miniters? This parameter seems useless, I really can’t imagine a case where one would use it instead of mininterval. Even mininterval is not really useful, it’s cool having it and we should keep it, but it’s unlikely that someone wants to see a slower or faster update rate than 0.1 s.
  • mininterval, like ncols (and miniters if we keep it), are not PEP8 names, and could be easier to understand. I suggest refresh_time for mininterval and width for ncols (the number of columns being the unit, not what it means to users).

@casperdcl
Copy link
Sponsor Member

I never really intended miniters or mininterval for end-users to make use of... like I said, they are really internal parameters for the sake of optimisation

@BertrandBordage
Copy link
Author

Ah didn’t realize it was a way to prevent from executing time() too much.
That makes the problem more complex, indeed.
So if the final problem is optimization, the solution is simply to write tqdm in Cython instead of Python. It will be way faster than it currently is, and executing time directly from C will take a ridiculously small time. Function and method calls in Python are way slower than getting a timestamp in C.

If needed, I can help on this part, as I’ve been using Cython for years and I’m specialized in Python/Cython/algorithm optimisation.

@casperdcl
Copy link
Sponsor Member

We've tested... getting the timestamp is by far the slowest thing. I only use cython for my local installs, it doesn't help tqdm significantly.

@casperdcl
Copy link
Sponsor Member

Unless you're talking about a different way of getting the time that is only possible in pure cython/C ?

@lrq3000
Copy link
Member

lrq3000 commented Nov 16, 2015

About mininterval being an internal parameter, I disagree. I modify it for time sensitive applications to minimize the overhead of tqdm significantly, for example by setting it to 1s or even more. This is very useful to keep a progress bar display while having virtually no overhead. Also, it's a very simple parameter, very easy for the end user to change and to understand the impact (actually, I think it's currently the easiest way to minimize tqdm's overhead).

About Cython implementation, I think that's useless, for one reason: PyPy. tqdm is fully compatible with PyPy, and usually PyPy gets as fast as Cython or even more in some applications. In the case of tqdm, I can't see what Cython optimization could do more than PyPy can't already optimize automatically. Cython is nowadays mainly useful for vectorized operations since it can work on numpy, but since we don't do any fancy mathematical operation, I think PyPy already optimizes all tqdm operations to the maximum possible.

About the present issue, I think that the previous tqdm behavior should be restored, it was easier to understand even if it incurs a performance hit. However, if @casperdcl can find a better fix, that would ofc be better :)

@lrq3000 lrq3000 mentioned this issue Nov 17, 2015
@BertrandBordage
Copy link
Author

@lrq3000 I agree with you, the previous behaviour should be restored, tqdm will become a bit slower (for a total of about 120 to 200 ns per non-display iteration), but it’s only a small performance issue that’ll not really affect most of the cases. In case of a real rare performance problem, people should just not use tqdm.

But I don’t totally agree with you about PyPy. Yes, Pypy could make this problem disappear, but maybe not as much as a Cython implementation. For me, there is two advantages to using Cython: it makes a library that works on CPython as well as PyPy, and it allows a direct access to C functions. PyPy can optimise by guessing types and compiling on-the-fly, but it can’t optimise internal functions functions of Python (like time.time) AFAIK. And that’s what will mainly block us.

Now, is it really worth writing tqdm in Cython? I don’t think so. Even with Cython, the system calls like getting a timestamp, writing to stderr, or getting the terminal width will still take a vast amount of time.

We could also make a poll/vote to know if users would mind having a 120 ns cost per iteration instead of 60 ns.

@lrq3000
Copy link
Member

lrq3000 commented Nov 21, 2015

About Cython vs PyPy, I doubt Cython will do a better optimization about type inference and stuff like that, so that's not where we will gain some performance. About system calls, if we can have some faster access using Cython, then yes there may be some significant gain, but I'm not sure about that, as I guess that Python is already doing system calls such as time using C, so using Cython will probably give us no gain at all.

About reverting to the previous behavior, you can check the alternative fix proposed by @casperdcl here, which preserve the current performances (theoretically): #55

I will soon review it (sorry about that, I should have reviewed it by now but I'm a bit too much busy, it will be done max on monday evening :S)

@casperdcl
Copy link
Sponsor Member

Just tried pypy. Wonderful stuff. I can confirm tqdm's pure python code runs faster with pypy than cython-compiled (may not be true for pure cython re-implementation). Anyway I think we're all agreed that we're definitely going to stick with pure python for this project.

lrq3000 added a commit that referenced this issue Nov 23, 2015
@lrq3000
Copy link
Member

lrq3000 commented Nov 23, 2015

@BertrandBordage please check if the latest version fixes your issue, you can play with the smoothing parameter to your desire. If not, we'll think something else out :)

@casperdcl
Copy link
Sponsor Member

Interesting that it says I closed this issue even though I didn't :)

@BertrandBordage
Copy link
Author

Sorry to say that, but it still freezes on the minimal example from the original post :
(Using tqdm 3.1.0 and Python 3.5)
Without changing miniters, the only workaround is to change smoothing to 0 or a really small value, like 1e-5.

@casperdcl Of course, the idea of a Cython implementation is not to just compile the Python source ;) You only win an average 30% performance by doing that. If you carefully code the Cython implementation, a lot of programs become several magnitude order faster depending on what your program uses (I even saw a 5000× speedup on a real use case once!). In tqdm’s case, the speedup would be small, maybe 2-3 times faster, but nothing groundbreaking, because most of the work is system calls.

@casperdcl
Copy link
Sponsor Member

The example you gave (3k near instantaneous iters then a long pause) is so wildly odd that it should require you to specify a lower miniters manually (set it to 1 to recover original behaviour). I've commented in the documentation that erratic rates (eg downloading over internet) should have miniters=1

@BertrandBordage
Copy link
Author

That’s right, but even with a realistic change of speed, the problem stays the same:

for i in tqdm(range(10000)):
    if i < 3000:
        sleep(1e-3)
    else:
        sleep(1)

Waiting 1 ms vs 1 s is realistic, when you check that an image exists otherwise convert it, you get waits of these type.

That’s great of you to have added a note in the documentation, but to me that’s still a regression from 1.0. Sorry to bother you :s

@casperdcl
Copy link
Sponsor Member

Not a bother, it's just that if we make miniters=1 the default (restoring old behaviour) the performance hit for a bunch of other cases becomes noticeable. Of course if enough people think the use case you describe is more common than quick iterations, we can revert.

@BertrandBordage
Copy link
Author

OK!

@lrq3000
Copy link
Member

lrq3000 commented Nov 24, 2015

I reopen the issue because there's a bug, the exponential moving average is not behaving like it should be: normally, miniters should converge towards 1, whatever was the initial big step iterations. For example using this code:

from tqdm import tqdm
from time import sleep

t = tqdm(total=10000)
t.update(1000)
for i in range(10000-1000):
    sleep(0.01)
    t.update()
t.close()

What we get is this sequence of iterations updates: [0, 1010, 1313, 1616, 1919, etc], you see that after the first smoothing reducing miniters=1000 to miniters=303, it then does nothing at all, while it should continually reduce miniters over time...

So this clearly needs fixing, plus I have an idea to fix this kind of issue for good: adding a maxinterval argument to automatically reduce miniters if it's too high that it takes longer than maxinterval to update the bar.

@lrq3000 lrq3000 reopened this Nov 24, 2015
@lrq3000
Copy link
Member

lrq3000 commented Nov 24, 2015

Ah and I forgot to say that I agree with @BertrandBordage that this use case should be fixed right now: as I've read and referenced in #48, it's quite easy to make progress bars for constant rate tasks, but it becomes interesting when you face variable rate tasks, and this is quite common for lots of applications (the best example being downloading over internet). So yes, we should fix that, but @casperdcl patch should be doing the trick, it seems there's only a small glitch somewhere to fix.

@casperdcl
Copy link
Sponsor Member

Right. All fixed and merged and tagged and... ready to push v3.1.1?

@casperdcl
Copy link
Sponsor Member

btw the sequence of updates in the case you described @lrq3000 is now [0, 1010, 1308, 1520, 1671, 1780, 1859, 1962, 2079, 2163, ...] on my machine...

@lrq3000
Copy link
Member

lrq3000 commented Nov 25, 2015

Wow @casperdcl, that was fast! I intended to try to fix it by myself today, but your fix is anyway a lot better and more elegant than what I thought 👍

I'm trying this now.

About maxinterval, my idea was to use it to do the following:

if delta_t > maxinterval:
    miniters = maxinterval * miniters / delta_t

After one too long update, this should effectively prevent any subsequent overflow of the time to update beyond maxinterval. In addition, this wouldn't cost much but an additional if statement. Do you think I should add this @casperdcl ?

@casperdcl
Copy link
Sponsor Member

Yeah, @lrq3000 sounds good. Maybe with a default value of 1 second or something...

@lrq3000
Copy link
Member

lrq3000 commented Nov 25, 2015

@casperdcl I thought about a default value of 60 seconds, we don't want to force anyone on updating too often the bar by default I guess.

About your latest code change, could you please explain why did you use mininterval / delta_t ? I know that it's to weight delta_it influence on miniters depending on the time spent for the last update, but why this ratio? The issue I can see is when delta_t << mininterval, in this case, delta_it influence will be multiplied by an integer >> 1. Was this on purpose?

@lrq3000
Copy link
Member

lrq3000 commented Nov 25, 2015

Finally I think I'll put the default at 10 seconds, I guess it's a good compromise

@casperdcl
Copy link
Sponsor Member

@lrq3000 yes to both your comments... when dynamic_miniters is true, my current implementation uses ema to set miniters to a values which corresponds as closely as possible to mininterval... this includes both increasing as well as decreasing.

About maxinterval I don't think 1s is too much as the overhead would be negligible for such loops - but 10s is ok too I suppose

@lrq3000
Copy link
Member

lrq3000 commented Nov 25, 2015

@casperdcl about maxinterval, I know, the issue isn't about the overhead would be negligible as you, but just I think we should not constraint the user too much: by setting 1, if the user wants to set a mininterval higher than 1s, he will also have to raise maxinterval. I guess that likely most users won't want to set a mininterval higher than 10s so that's why I propose 10s. Anyway, we can set this for now and see in the future if there's any issue :)
I'm working now on how to test this new feature lol...

About smoothing dynamic_miniters, ok that's what I thought, you tried to make miniters converge towards mininterval. That's a very nice idea :) Ok I've done some more tests, everything's OK. The way it works may seem a bit counterintuitive at first but it works correctly as we expect. Great job 👍

@casperdcl
Copy link
Sponsor Member

np. thanks @BertrandBordage for the suggestions!

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

Ok, done with maxinterval, and I added all relevant unit tests (I hope they will be stable enough, since we need time.sleep() to test them it was quite a bit tricky to design those tests...).

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

Well, the test test_smoothed_dynamic_min_iters_with_min_interval() sometimes fails on Travis on pypy for the iteration-based tqdm test... So I have commented the asserts, we'll need to find a more reliable way to do that test...

@casperdcl
Copy link
Sponsor Member

@lrq3000 this really should've been done on a branch and PR, nor directly on master. Now the commit history looks ugly and includes tagged releases which may be broken... And since they're tagged it's not a good idea to rebase and remove them.

As a general rule of thumb I only push to master if it's a small one-commit change that isn't likely to break anything and I don't tag until it definitely passes on Travis and all my systems. Also I tend to merge (not rebase) PRs into master then ammend to include the version bump in the merge commit. I think this is best to track which branches things came from etc... let me know if you guys don't agree.

@casperdcl
Copy link
Sponsor Member

Thanks for the implementation and trying to write the tests, though!

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

@casperdcl Yes sorry, you're totally right, I don't know why but I thought this would be a one-commit change and that it was just small change from the last PR. I hadn't foreseen it would be such a mess to get working right with unit tests...

I will now be a lot more conservative, I will branch for anything but very small "cosmetic" changes.

About merging PRs into master, yes that's very nice indeed to see where they come from. How do you do that? Do you use Github's interface or commandline?

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

@casperdcl Also can I ask you what "rules" you usually use to know when to merge which commits together? (this will help me better know how to manage my commits, I'm not experienced working in teams with git :) )

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

Ok I fixed the issue with the iteration-based tqdm test, all should be ok now. I have uploaded the new version 3.1.4 on pypi.

@casperdcl
Copy link
Sponsor Member

Well I never use git's automated merge because then i'd have to pull locally, amend and then force-push, which is not great...

  1. git-checkout master (locally)
  2. git-fetch --all
  3. git-pull
  4. git-merge new_feature_branch
  5. Update _version.py
  6. git-add _version.py
  7. git-commit --amend (add 'bump version' to the 'merge' message)
  8. Test
  9. git-push
  10. Wait for Travis to be safe
  11. If fails, fix locally (preferably amending commits rather than making new ones... or if you commit new ones do a little rebase) then continue from step (8)... force-push at step (9)...
  12. git-tag vx.x.x
  13. push --tags

Optional:
13. delete branch from the PR page
14. git-branch -D new_feature_branch (locally delete branch)

General rule: once you push a tag to git (or create the tag online) you can no longer alter/mess with/rebase anything from that tag and older.

If all this is too complex just wait for someone else to do it. It's good practice not to merge your own code anyway (because that ensures someone else looks at it). This was a longer message than I thought it was going to be.

Sent from my mobile device

@casperdcl
Copy link
Sponsor Member

Like I said I'm happy to do all of the above if it's too complex. Can I ask what are the steps you follow for uploading to pypi? I know there are rst issues (links etc?) and dependencies for building windows binaries...

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

Oh, thank you very much @casperdcl for detailing all your steps to merge a branch 👍 I know, I won't merge by myself my own branches (I did for the latest branch but it's because it was a simple bugfix), but it's interesting to know how it's done :)

About uploading to pypi, you just need to do python setup.py make buildupload, it will run the readme checks, compile and then ask you the credentials to upload to pypi (it will ask you twice: first to upload the readme, and then to upload the builds).

For Windows, there's no dependency except that you need to run Windows :/ But anyway that's not a big deal if the win build is not done, since this is a very simple library with no dependency, there's no issue for anyone to install from the .zip.

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

PS: a tip when building: you always need to remove the folders dist, build and the file tqdm.egg-info before launching a build (this is done in the Makefile), else your build may not include the latest files changes (there is a record inside the egg) and you may upload older versions of tqdm along the new one (because when you will upload the builds using twine, it will upload everything inside the dist folder!).

@casperdcl
Copy link
Sponsor Member

I think it's nice to upload to pypi with windows binaries... are you saying this is only possible while running windows? No cross-compilation possible?

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

To my knowledge, no, it's not possible. To compile Windows binaries, you need the Python win32 interpreter to work, which is compiled with Visual Studio, so you cannot make it work on a Linux platform. From what I've read, people that do cross-compilations just install Windows in a VM and make the builds inside (repeat the process if you want to compile MacOSX builds...).

If someday Python switch to MinGW or cygwin gcc to compile Python, maybe it will be possible to build win32 Python apps on Linux, but for now it seems that's not possible.

@lrq3000
Copy link
Member

lrq3000 commented Nov 28, 2015

But anyway you can try it for yourself if you're running Linux, you just need to run this command:

python setup.py sdist bdist_wininst

On Windows, it generates a win32 installer without requiring anything but native Python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants