Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG(string dtype): Empty sum produces incorrect result #60936

Merged
merged 2 commits into from
Mar 10, 2025

Conversation

rhshadrach
Copy link
Member

@rhshadrach rhshadrach added Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data labels Feb 15, 2025
@rhshadrach
Copy link
Member Author

Friendly ping @WillAyd @jorisvandenbossche

@rhshadrach rhshadrach added this to the 2.3 milestone Feb 19, 2025
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good - minor question

if op.how == "sum":
# https://github.com/pandas-dev/pandas/issues/60229
# All NA should result in the empty string.
assert "skipna" in kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the need for assert here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should always be adding skipna to kwargs by the time we get here. If that doesn't happen for some reason (e.g. a future refactor), it can help debugging as an assert failing indicates a clear violation of an assumption, key error perhaps not. If it did somehow end up in a release with user code raising, an assert also indicates to the user "this is clearly a bug in pandas".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WillAyd - friendly ping

@WillAyd WillAyd merged commit dab1b88 into pandas-dev:main Mar 10, 2025
42 checks passed
@WillAyd
Copy link
Member

WillAyd commented Mar 10, 2025

Great thanks @rhshadrach

Copy link

lumberbot-app bot commented Mar 10, 2025

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.3.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 dab1b881da2d7d49200808f45942868f1953effa
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #60936: BUG(string dtype): Empty sum produces incorrect result'
  1. Push to a named branch:
git push YOURFORK 2.3.x:auto-backport-of-pr-60936-on-2.3.x
  1. Create a PR against branch 2.3.x, I would have named this PR:

"Backport PR #60936 on branch 2.3.x (BUG(string dtype): Empty sum produces incorrect result)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Groupby Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Still Needs Manual Backport Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG/API: sum of a string column with all-NaN or empty
2 participants