Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: astype transforms NA to "NA" #61141

Closed
3 tasks done
latot opened this issue Mar 17, 2025 · 6 comments
Closed
3 tasks done

BUG: astype transforms NA to "NA" #61141

latot opened this issue Mar 17, 2025 · 6 comments
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data

Comments

@latot
Copy link

latot commented Mar 17, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas

a = pandas.Series([pandas.NA], dtype = "str")

# This is tight
print(type(a[0]))
<class 'pandas._libs.missing.NAType'>

print(type(a.astype("str")[0]))
<class 'str'>

Issue Description

When we work with missing data, and we do transformation from NA to "str", is does not keep the NA value, instead returns the string "NA".

Expected Behavior

Return NA instead of "NA"

Installed Versions

INSTALLED VERSIONS

commit : 0691c5c
python : 3.12.9
python-bits : 64
OS : Linux
OS-release : 6.12.16-gentoo-x86_64
Version : #1 SMP PREEMPT_DYNAMIC Tue Feb 25 08:36:23 -03 2025
machine : x86_64
processor : AMD Ryzen 7 5800H with Radeon Graphics
byteorder : little
LC_ALL : None
LANG : es_CL.utf8
LOCALE : es_CL.UTF-8

pandas : 2.2.3
numpy : 2.2.3
pytz : 2025.1
dateutil : 2.9.0.post0
pip : 25.0.1
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
psycopg2 : None
pymysql : None
pyarrow : None
pyreadstat : None
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlsxwriter : None
zstandard : None
tzdata : 2025.1
qtpy : None
pyqt5 : None

@latot latot added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 17, 2025
@latot latot changed the title BUG: astype ransforms NA to "NA" BUG: astype transforms NA to "NA" Mar 17, 2025
@kpvenkat47
Copy link

I think you could use fillna("NA") to handle this. Here's a suggestion:

import pandas as pd
a = pd.Series([pd.NA], dtype="str").fillna("NA")

@latot
Copy link
Author

latot commented Mar 18, 2025

Hi, thx for the suggest, sadly it makes hard to handle if there is the "NA" word on the Series, any "code" we can choose also would need to be checked and handled manually to do not mix/nor write the "code".

I really like the pandas.NA, finally a way to have the db NULL!

@kpvenkat47
Copy link

If my understanding is correct, you can use dtype=pd.StringDtype().

import pandas as pd

a = pd.Series([pd.NA, '5', '10'], dtype=pd.StringDtype())
print(a)

# Output:
# 0    <NA>
# 1       5
# 2      10
# dtype: string

@rhshadrach
Copy link
Member

Thanks for the report. In the future, the NA value will be preserved, but will be np.NaN by default.

pd.set_option("future.infer_string", True)
a = pd.Series([pd.NA], dtype = "str")

print(a)
# 0    NaN
# dtype: str

print(type(a[0]))
# <class 'float'>

print(type(a.astype("str")[0]))
# <class 'float'>

If you want pd.NA as the NA value, you can use "string" instead of "str". See PDEP-14 for more details.

Closing.

@rhshadrach rhshadrach added Strings String extension data type and string data Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 19, 2025
@latot
Copy link
Author

latot commented Mar 19, 2025

@rhshadrach thx for the answer!, why np.NaN? is like breaking types using it instead of pandas.NA.

I think would be good have a table with dtypes and what behaviors with them, actually them case by case seems be very hard.

@rhshadrach
Copy link
Member

I believe this is explained in the PDEP-14 I linked to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

3 participants