Skip to content

Commit

Permalink
14% overhead reduction, flake8 fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
casperdcl committed Aug 16, 2016
1 parent c893d90 commit 0346a4a
Showing 1 changed file with 12 additions and 12 deletions.
24 changes: 12 additions & 12 deletions tqdm/_tqdm.py
Original file line number Diff line number Diff line change
Expand Up @@ -723,10 +723,10 @@ def __iter__(self):
n += 1
# check the counter first (avoid calls to time())
if n - last_print_n >= miniters:
delta_it = n - last_print_n
cur_t = _time()
delta_t = cur_t - last_print_t
delta_t = _time() - last_print_t
if delta_t >= mininterval:
cur_t = _time()
delta_it = n - last_print_n

This comment has been minimized.

Copy link
@lrq3000

lrq3000 Aug 17, 2016

Member

Good catch @casperdcl !

But about cur_t, are you sure it's more efficient to store it after? Because it gets called twice (and also cur_t can be a bit different from what gets stored in delta_t, this can make calculations a bit off but I didn't dig to see the exact potential consequences).

This comment has been minimized.

Copy link
@casperdcl

casperdcl Aug 17, 2016

Author Member

yes, I checked. there'll be a minor time discrepancy but the cost of assignment x number of executions greatly exceeds the second inner loop execution

elapsed = cur_t - start_t
# EMA (not just overall average)
if smoothing and delta_t:
Expand Down Expand Up @@ -762,7 +762,7 @@ def __iter__(self):
/ delta_t + (1 - smoothing) * miniters
else:
miniters = smoothing * delta_it + \
(1 - smoothing) * miniters
(1 - smoothing) * miniters

# Store old values for next call
self.n = self.last_print_n = last_print_n = n
Expand Down Expand Up @@ -801,12 +801,12 @@ def update(self, n=1):
raise ValueError("n ({0}) cannot be negative".format(n))
self.n += n

delta_it = self.n - self.last_print_n # should be n?
if delta_it >= self.miniters:
if self.n - self.last_print_n >= self.miniters:
# We check the counter first, to reduce the overhead of time()
cur_t = self._time()
delta_t = cur_t - self.last_print_t
delta_t = self._time() - self.last_print_t
if delta_t >= self.mininterval:
cur_t = self._time()
delta_it = self.n - self.last_print_n # should be n?
elapsed = cur_t - self.start_t
# EMA (not just overall average)
if self.smoothing and delta_t:
Expand Down Expand Up @@ -843,14 +843,14 @@ def update(self, n=1):
if self.dynamic_miniters:
if self.maxinterval and delta_t > self.maxinterval:
self.miniters = self.miniters * self.maxinterval \
/ delta_t
/ delta_t
elif self.mininterval and delta_t:
self.miniters = self.smoothing * delta_it \
* self.mininterval / delta_t + \
(1 - self.smoothing) * self.miniters
* self.mininterval / delta_t + \
(1 - self.smoothing) * self.miniters
else:
self.miniters = self.smoothing * delta_it + \
(1 - self.smoothing) * self.miniters
(1 - self.smoothing) * self.miniters

# Store old values for next call
self.last_print_n = self.n
Expand Down

18 comments on commit 0346a4a

@CrazyPython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl this commit shows that tqdm is nowhere near optimal performance - a single commit can make it 14% faster. maybe you should have a "performance bash" sometime and encourage people from outside to contribute?

@lrq3000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the precision of the 14% figure, it's quite hard to quantify at this level (this was discussed in some ancient issue), but for sure we gain a few opcodes because the storage happens later in the code. But that's micro-optimization, of course we can do that nearly everywhere else, but then IMO we need to have a balance between readability and performance.

@CrazyPython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 Hm, there should be an automated test that accurately spits out a ns/iter number we can use to measure performance. Yes, the time would be system-dependent, but it's a pretty useful figure.

@lrq3000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are here: https://github.com/tqdm/tqdm/blob/master/examples/simple_examples.py

But yeah we cannot use them for performance checking, so we have other techniques to do that in tqdm_perf.py. But then it's not just system-dependent but also dependent on other processes so it's quite hard to know the real performance.

@CrazyPython
Copy link
Contributor

@CrazyPython CrazyPython commented on 0346a4a Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 run it on a Travis CI instance or some other cloud instance e.g. AWS.

or a isolated raspberry pi with nothing else running

@lrq3000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue, that's what we did initially but it's totally unreliable, the values will change from one instance to another.

@CrazyPython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 Hm, why? They should provide a static amount of CPU resources.

@lrq3000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casperdcl Just checked the change of cur_t, it shouldn't have any bad consequences as far as I can see (it can only be bigger than during the check, which still meets the condition and doesn't change a thing for later calculations).

@lrq3000
Copy link
Member

@lrq3000 lrq3000 commented on 0346a4a Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CrazyPython not really they virtualize the processes so it depends on other processes also running at the same time (not just our repo but any other repo), see #59 and #101. That's also why our performance tests can still fail sometimes even if we mostly fixed that by using large bounds and virtual timers (see #51, #52, #59, #60, for the historical log).

The ultimate way to reliably analyze tqdm performance would be to count all opcodes generated at runtime, I implemented something that works in Py2.7 in #90 but it doesn't work in Py3 due to some stdlib API change, and I'm not advanced enough to make my own bytecode introspector so the PR is postponed until then :(

@CrazyPython
Copy link
Contributor

@CrazyPython CrazyPython commented on 0346a4a Aug 17, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 Raspberry Pi solution? $5, no extra wifi stuff needed. See, this is why you helpless devs need donations.

But Life™ had to be ruined by Lawyers™®.

@lrq3000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but then we would have to maintain a clean Raspberry Pi solution at all time and hope that the absolute values won't change with various uncontrollable parameters like cosmic rays or just bit rot. I'm really convinced we can't rely on absolute performance metrics, but reliable perf is ok because at least it quantifies the overhead bounds (currently at most 2x compared to a bare minimalist progress bar).

@CrazyPython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 on that note - we should have a bare minimalist progress bar version somewhere.

@lrq3000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we have in the perf test! But it's not publicly shown because there's little utility (someone wanting that much micro speed benefit will probably use another language anyway, or disable the progress bar altogether).

@CrazyPython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 2x speed * inner loop = huge perf boost

@lrq3000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but that's not what you get with micro-optimizations usually. If we want to be serious about it, we should profile the code, but I doubt there is much room left for big optimization.

@CrazyPython
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lrq3000 A optimization bug bash would be nice. Really nice.

@casperdcl
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, the 14% find was a result of several long running empty loops + line profiling on my machine. it may vary by about a percent. It refers purely to time spent "doing tqdm stuff" before and after the commit.

@lrq3000
Copy link
Member

@lrq3000 lrq3000 commented on 0346a4a Aug 18, 2016 via email

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.