-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Reproducible Example
import pandas as pd
series = pd.Series([None,1,2,None,3,4,None])
series.mask(series <= 2, -99)
"""
0 NaN
1 -99.0
2 -99.0
3 NaN
4 3.0
5 4.0
6 NaN
dtype: float64
"""
series = series.convert_dtypes()
series.mask(series <= 2, -99)
"""
0 -99
1 -99
2 -99
3 -99
4 3
5 4
6 -99
dtype: Int64
"""
series = series.convert_dtypes(dtype_backend='pyarrow')
series.mask(series <= 2, -99)
"""
0 -99
1 -99
2 -99
3 -99
4 3
5 4
6 -99
dtype: int64[pyarrow]
"""
Issue Description
When using Series.mask on a Series with a NumPy dtype, np.nan is not replaced. However, for Series with Pandas or PyArrow dtypes, pd.NA is replaced. This behavior is inconsistent and makes it difficult to predict the outcome.
Expected Behavior
import pandas as pd
series = pd.Series([None,1,2,None,3,4,None], dtype='int64[pyarrow]')
series.mask(series <= 2, -99)
"""
0 <NA>
1 -99
2 -99
3 <NA>
4 3
5 4
6 <NA>
dtype: int64[pyarrow]
"""
Installed Versions
INSTALLED VERSIONS
commit : 0691c5c
python : 3.9.18
python-bits : 64
OS : Linux
OS-release : 3.10.0-1160.15.2.el7.x86_64
Version : #1 SMP Wed Feb 3 15:06:38 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.2.3
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
pip : 24.3.1
Cython : None
sphinx : None
IPython : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
blosc : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : 2024.6.1
html5lib : None
hypothesis : None
gcsfs : None
jinja2 : None
lxml.etree : 5.2.2
matplotlib : 3.9.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
psycopg2 : 2.9.9
pymysql : None
pyarrow : 19.0.0
pyreadstat : 1.2.8
pytest : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : 2.0.1
xlsxwriter : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
Activity
sanggon6107 commentedon Jan 18, 2025
Hi @kartoria ,
I think this is because :
I'll do further investigation and make a PR if there's a nice way to fix this.
sanggon6107 commentedon Jan 18, 2025
take
rhshadrach commentedon Jan 22, 2025
Thanks for the report! Agreed with the OP's expectation. cc @jorisvandenbossche @WillAyd to double check.
WillAyd commentedon Jan 22, 2025
Thanks for the report. We should not be filling the pd.NA values - those should propogate on through
BUG: Add fillna so that cond doesnt contain NA at the beginning of _w…
TST: Add tests for mask with NA. (pandas-dev#60729)
BUG: Fix _where to make np.ndarray mutable. (pandas-dev#60729)
DOC: Add documentation regarding the bug (pandas-dev#60729)
BUG: Fix a bug in test_mask_na() (pandas-dev#60729)
alherrera-cs commentedon Feb 28, 2025
Hi, I'd like to work on this issue. I can investigate the behavior of Series.mask() regarding pd.NA and propose a solution to standardize its behavior across different dtypes. Please let me know if I can be assigned to this issue.
sanggon6107 commentedon Mar 1, 2025
Hi @alherrera-cs,
Sure! Since my PR hasn't been approved, I think you can take if you have a nice idea.
frbelotto commentedon May 27, 2025
I faced the same issue here. I was reporting it on Issue 61505, but closed it as I´ve found this one. Any updates on it?