-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Commit
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -723,10 +723,10 @@ def __iter__(self): | |
n += 1 | ||
# check the counter first (avoid calls to time()) | ||
if n - last_print_n >= miniters: | ||
delta_it = n - last_print_n | ||
cur_t = _time() | ||
delta_t = cur_t - last_print_t | ||
delta_t = _time() - last_print_t | ||
if delta_t >= mininterval: | ||
cur_t = _time() | ||
delta_it = n - last_print_n | ||
This comment has been minimized.
Sorry, something went wrong.
This comment has been minimized.
Sorry, something went wrong.
casperdcl
Author
Member
|
||
elapsed = cur_t - start_t | ||
# EMA (not just overall average) | ||
if smoothing and delta_t: | ||
|
@@ -762,7 +762,7 @@ def __iter__(self): | |
/ delta_t + (1 - smoothing) * miniters | ||
else: | ||
miniters = smoothing * delta_it + \ | ||
(1 - smoothing) * miniters | ||
(1 - smoothing) * miniters | ||
|
||
# Store old values for next call | ||
self.n = self.last_print_n = last_print_n = n | ||
|
@@ -801,12 +801,12 @@ def update(self, n=1): | |
raise ValueError("n ({0}) cannot be negative".format(n)) | ||
self.n += n | ||
|
||
delta_it = self.n - self.last_print_n # should be n? | ||
if delta_it >= self.miniters: | ||
if self.n - self.last_print_n >= self.miniters: | ||
# We check the counter first, to reduce the overhead of time() | ||
cur_t = self._time() | ||
delta_t = cur_t - self.last_print_t | ||
delta_t = self._time() - self.last_print_t | ||
if delta_t >= self.mininterval: | ||
cur_t = self._time() | ||
delta_it = self.n - self.last_print_n # should be n? | ||
elapsed = cur_t - self.start_t | ||
# EMA (not just overall average) | ||
if self.smoothing and delta_t: | ||
|
@@ -843,14 +843,14 @@ def update(self, n=1): | |
if self.dynamic_miniters: | ||
if self.maxinterval and delta_t > self.maxinterval: | ||
self.miniters = self.miniters * self.maxinterval \ | ||
/ delta_t | ||
/ delta_t | ||
elif self.mininterval and delta_t: | ||
self.miniters = self.smoothing * delta_it \ | ||
* self.mininterval / delta_t + \ | ||
(1 - self.smoothing) * self.miniters | ||
* self.mininterval / delta_t + \ | ||
(1 - self.smoothing) * self.miniters | ||
else: | ||
self.miniters = self.smoothing * delta_it + \ | ||
(1 - self.smoothing) * self.miniters | ||
(1 - self.smoothing) * self.miniters | ||
|
||
# Store old values for next call | ||
self.last_print_n = self.n | ||
|
18 comments
on commit 0346a4a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperdcl this commit shows that tqdm is nowhere near optimal performance - a single commit can make it 14% faster. maybe you should have a "performance bash" sometime and encourage people from outside to contribute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about the precision of the 14% figure, it's quite hard to quantify at this level (this was discussed in some ancient issue), but for sure we gain a few opcodes because the storage happens later in the code. But that's micro-optimization, of course we can do that nearly everywhere else, but then IMO we need to have a balance between readability and performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrq3000 Hm, there should be an automated test that accurately spits out a ns/iter number we can use to measure performance. Yes, the time would be system-dependent, but it's a pretty useful figure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are here: https://github.com/tqdm/tqdm/blob/master/examples/simple_examples.py
But yeah we cannot use them for performance checking, so we have other techniques to do that in tqdm_perf.py. But then it's not just system-dependent but also dependent on other processes so it's quite hard to know the real performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrq3000 run it on a Travis CI instance or some other cloud instance e.g. AWS.
or a isolated raspberry pi with nothing else running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same issue, that's what we did initially but it's totally unreliable, the values will change from one instance to another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrq3000 Hm, why? They should provide a static amount of CPU resources.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@casperdcl Just checked the change of cur_t
, it shouldn't have any bad consequences as far as I can see (it can only be bigger than during the check, which still meets the condition and doesn't change a thing for later calculations).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CrazyPython not really they virtualize the processes so it depends on other processes also running at the same time (not just our repo but any other repo), see #59 and #101. That's also why our performance tests can still fail sometimes even if we mostly fixed that by using large bounds and virtual timers (see #51, #52, #59, #60, for the historical log).
The ultimate way to reliably analyze tqdm
performance would be to count all opcodes generated at runtime, I implemented something that works in Py2.7 in #90 but it doesn't work in Py3 due to some stdlib API change, and I'm not advanced enough to make my own bytecode introspector so the PR is postponed until then :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrq3000 Raspberry Pi solution? $5, no extra wifi stuff needed. See, this is why you helpless devs need donations.
But Life™ had to be ruined by Lawyers™®.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but then we would have to maintain a clean Raspberry Pi solution at all time and hope that the absolute values won't change with various uncontrollable parameters like cosmic rays or just bit rot. I'm really convinced we can't rely on absolute performance metrics, but reliable perf is ok because at least it quantifies the overhead bounds (currently at most 2x compared to a bare minimalist progress bar).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrq3000 on that note - we should have a bare minimalist progress bar version somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we have in the perf test! But it's not publicly shown because there's little utility (someone wanting that much micro speed benefit will probably use another language anyway, or disable the progress bar altogether).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrq3000 2x speed * inner loop = huge perf boost
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but that's not what you get with micro-optimizations usually. If we want to be serious about it, we should profile the code, but I doubt there is much room left for big optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lrq3000 A optimization bug bash would be nice. Really nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incidentally, the 14% find was a result of several long running empty loops + line profiling on my machine. it may vary by about a percent. It refers purely to time spent "doing tqdm stuff" before and after the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch @casperdcl !
But about cur_t, are you sure it's more efficient to store it after? Because it gets called twice (and also cur_t can be a bit different from what gets stored in delta_t, this can make calculations a bit off but I didn't dig to see the exact potential consequences).