Open
Description
pandas generally tries to coerce values to fit the column dtype, or upcasts the dtype to fit.
For a setting operation this is convenient & I think expected as a user
In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : ['foo'], 'D' : [1]})
In [36]: df
Out[36]:
A B C D
0 NaT NaN foo 1
In [37]: df.dtypes
Out[37]:
A datetime64[ns]
B object
C object
D int64
dtype: object
In [38]: df.loc[0,'D'] = 1.0
In [39]: df.dtypes
Out[39]:
A datetime64[ns]
B object
C object
D float64
dtype: object
However for a .fillna
(or .replace
) operation this might be a bit unexpected. So A
was coerced to object
dtype, even though it was datetime64[ns]
.
In [40]: df.fillna('')
Out[40]:
A B C D
0 foo 1
In [41]: df.fillna('').dtypes
Out[41]:
A object
B object
C object
D float64
dtype: object
So a possibility is to add a keyword errors='raise'|'coerce'|'ignore'
. This last behavior would be equiv of errors='coerce'
. While skipping this column would be done with errors='coerce'
. (and of course raise
would raise.
Ideally would have a default of coerce
I think (to skip for non-compat values). Any thoughts on this?
Activity
jreback commentedon Jan 4, 2016
cc @ywang007
ResidentMario commentedon Mar 8, 2017
xref. #15533
@jreback I think this keyword would be a 👍. This would be a way of harmonizing the for/against validating forcefully/weakly that are under discussion at PR#15587. Once that PR is added, this behavior could presumably be added as a single
if errors == 'raise': validate_fill_value(obj, value)
call.I think it's worth considering adding similar behavior to methods implementing
fill_value
. I'm not sure I like that idea, it feels like a lot of API overhead, but, worth considering.mroeschke commentedon Apr 21, 2021
This behavior no longer coerces to object. I supposed it could use a test orthoganal to the enhancement request
mroeschke commentedon May 12, 2021
Actually I think this is a bug and the original behavior was correct.
NaT
is a "na value" that wasn't replaced by empty string2 remaining items
eirkkr commentedon Sep 21, 2021
Hello, just to add to this thread. I have encountered this bug when upgrading pandas from 1.2.5 to 1.3.3 (it looks like this bug was introduced in version 1.3.0).
When using fillna or replace on a datetime series, converting to empty string
""
will not work. However, when using another string e.g."hello"
it will work, and coerce the series to object type. Also interestingly,df.replace({pd.NaT: ""})
has different behaviour todf.replace(pd.NaT, "")
AvivAvital2 commentedon Oct 26, 2021
Also reproduced on 1.3.4
yeyeric commentedon May 2, 2022
same here on latest 1.4.2, pd.fillna('') doesn't work with NaT (pd.isnull() gives True though)
pd.fillna('something') works...
Very surpising it has been here since 2016 ?
evelynegroen commentedon Jul 19, 2022
same on version 1.4.3,
df = pd.DataFrame({"A": [pd.NaT]})
,df.fillna("")
will do nothing,df.fillna(" ")
will replace NaT with a blank space.Supertramplee commentedon Aug 10, 2022
same here, NaT still shows if fill na with empty string
df.fillna('')
mroeschke commentedon Aug 10, 2022
The core issue here appears to be specifically because the Timestamp constructor interprets empty string as
pd.NaT
and therefore the datetime64 type is not upcast to objectIf the behavior of
Out[8]
was deprecated to not returnNaT
then this behavior would probably be fixedMasumi-M commentedon Dec 8, 2022
This might be the temporary measure 👍
ciscorucinski commentedon May 27, 2023
I'm a novice, but it seems to still be present in 2.0.1
msingh0101 commentedon Oct 7, 2023
still present
baptiste-pasquier commentedon Oct 25, 2023
There is also a bug when replacing with the string
"NAN"
:Pesec1 commentedon Dec 12, 2024
df.fillna(pd.NA)
was not fillingnp.nan
andpd.NaT
, but if I would check there values for missing it would comeback as true.this fixed it for me
df.fillna(pd.NA, axis=1)
.I have pandas version 2.2.3 and python3 version 3.12.7.
hewliyang commentedon Jan 7, 2025
in contrast, filling on
axis=0
still does not work (it should) due to the same reason (dtype conflicts)Annam679 commentedon Feb 4, 2025
So for some reason this worked:
df,fillna('')
df.fillna(pd.NaT, axis=1)
df.fillna('')
literally without the second fillna its not working but with all three its working