Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Series.unique() terminates strings prematurely on null Bytes #53720

Closed
3 tasks done
Nadrons opened this issue Jun 19, 2023 · 2 comments
Closed
3 tasks done

BUG: Series.unique() terminates strings prematurely on null Bytes #53720

Nadrons opened this issue Jun 19, 2023 · 2 comments
Assignees
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Strings String extension data type and string data

Comments

@Nadrons
Copy link

Nadrons commented Jun 19, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

outp = pd.Series(['A\x00B', 'A\x00C']).unique()

print(outp)

# prints:
# ['A\x00B']

Issue Description

Series.unique() fails to detect unique strings when null bytes are included.

As per this question and this issue, it seems that this is another case of strings being passed to a Cython function and terminating early on null bytes.

Expected Behavior

Should return ['A\x00B' 'A\x00C']

Installed Versions

pandas : 2.0.2

@Nadrons Nadrons added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 19, 2023
@sudoWin
Copy link

sudoWin commented Jun 20, 2023

take

@mroeschke mroeschke added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Strings String extension data type and string data and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 17, 2024
@rhshadrach
Copy link
Member

Closing as a duplicate of #34551

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

4 participants