Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix the problem #299 in terms of using tqdm in pandas #524

Merged
merged 7 commits into from Apr 3, 2018

Conversation

chengs
Copy link
Contributor

@chengs chengs commented Mar 17, 2018

#299 partially achieved its goal. With forbidding *args, one can get correct axis now, and use it to get the total (as number of iterations in progress_apply)

However, pandas sometimes runs several more iterations to optimise itself. For example, in dataframe.progress_apply(func).

  • if func is a numpy.ufunc or is very slow, then it will be executed total times.
  • but if it is very quick or in some other cases, it will be executed total+1 or total+shape[axis0] times.

I think there is no need to adjust such particular pandas implementation, so in the wrapper function, I write

def wrapper(*args, **kwargs):
                    # update tbar correctly
                    # it seems pandas apply calls func twice on the first column/row 
                    # to decide whether it can take a fast or slow code path.
                    # so stop when t.total==t.n
                    t.update(n=1 if t.total and t.n < t.total else 0)
                    return func(*args, **kwargs)

if total is achieved, tbar will not be updated.

@codecov-io
Copy link

codecov-io commented Mar 17, 2018

Codecov Report

Merging #524 into master will increase coverage by 0.3%.
The diff coverage is 100%.

@@           Coverage Diff            @@
##           master    #524     +/-   ##
========================================
+ Coverage    99.1%   99.4%   +0.3%     
========================================
  Files           8       8             
  Lines         668     674      +6     
  Branches      118     118             
========================================
+ Hits          662     670      +8     
+ Misses          4       3      -1     
+ Partials        2       1      -1

@chengs
Copy link
Contributor Author

chengs commented Mar 17, 2018

I add new tests, so now, pandas.Series, pandas.Dataframe, and groupby cases are all covered. Not cover pandas.Panel because it is going to be removed. http://pandas.pydata.org/pandas-docs/version/0.20/whatsnew.html#whatsnew-0200-api-breaking-deprecate-panel

Transmitted to `df.apply()`.
"""

# Precompute total iterations
total = getattr(df, 'ngroups', None)
Copy link
Contributor Author

@chengs chengs Mar 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this ngroups is an inner attribute of pandas. It may be changed in future, but seems not happen in near future or never. So let's keep it so far.

For perfection, I will suggest to use one of pandas public apis to get group size in future. So far, it is fine.

@casperdcl casperdcl self-requested a review March 19, 2018 20:30
@casperdcl casperdcl added to-review 🔍 Awaiting final confirmation submodule ⊂ Periphery/subclasses labels Mar 19, 2018
- suppress `RuntimeError: Set changed size during iteration` (tqdm#481)
- partially re-add 64f5e73

Happy Easter!
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
submodule ⊂ Periphery/subclasses to-review 🔍 Awaiting final confirmation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants