Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Consistent NA handling in unique(), and nunique() #61209

Open
1 of 3 tasks
olek-osikowicz opened this issue Mar 31, 2025 · 2 comments
Open
1 of 3 tasks

ENH: Consistent NA handling in unique(), and nunique() #61209

olek-osikowicz opened this issue Mar 31, 2025 · 2 comments
Assignees
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@olek-osikowicz
Copy link

olek-osikowicz commented Mar 31, 2025

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Currently Series.nunique has a default parameter dropna=True.
However Series.unique does not accept the dropna the parameter.

This can cause the unexpected behaviour when: s.nunique() is not nessesarly equal to len(s.unique()).
See example below:

>>> import pandas as pd
>>> s = pd.Series([pd.NA, 1, pd.NA])
>>> s.unique()
array([<NA>, 1], dtype=object)
>>> len(s.unique())
2
>>> s.nunique()
1

I believe it should be addressed to avoid implicit behaviour.

Feature Description

Simplest way to addess it would be to change the default parameter of Series.nunique to dropna=False.
Analogously the same default parameter for DataFrame.nunique.

This would be consistent with current summary of the method:

Count number of distinct elements in specified axis.
Return Series with number of distinct elements. Can ignore NaN values.

"Can ignore NaN values.", hints that should be optional parameter not enabled by default.

Alternative Solutions

Another approach to force consistent NaN handling by default would be to addapt Series.unique to accept dropna and set it to True by default.

Although possible, this is more laborious and more impactful change on Pandas API.

Additional Context

No response

EDIT: Typos

@olek-osikowicz olek-osikowicz added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 31, 2025
@HoqueUM
Copy link

HoqueUM commented Apr 1, 2025

take

@snitish
Copy link
Member

snitish commented Apr 4, 2025

I think it should be dropna=True by default, so your alternative solution, i.e. add dropna to Series.unique (with default set to True) makes more sense to me. cc: @rhshadrach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants