Skip to content

BUG: fillna('') does not replace NaT #11953

Open
@jreback

Description

@jreback
Contributor

pandas generally tries to coerce values to fit the column dtype, or upcasts the dtype to fit.

For a setting operation this is convenient & I think expected as a user

In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : ['foo'], 'D' : [1]})

In [36]: df
Out[36]:
    A    B    C  D
0 NaT  NaN  foo  1

In [37]: df.dtypes
Out[37]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [38]: df.loc[0,'D'] = 1.0

In [39]: df.dtypes
Out[39]:
A    datetime64[ns]
B            object
C            object
D           float64
dtype: object

However for a .fillna (or .replace) operation this might be a bit unexpected. So A was coerced to object dtype, even though it was datetime64[ns].

In [40]: df.fillna('')
Out[40]:
  A B    C  D
0      foo  1

In [41]: df.fillna('').dtypes
Out[41]:
A     object
B     object
C     object
D    float64
dtype: object

So a possibility is to add a keyword errors='raise'|'coerce'|'ignore'. This last behavior would be equiv of errors='coerce'. While skipping this column would be done with errors='coerce'. (and of course raise would raise.

Ideally would have a default of coerce I think (to skip for non-compat values). Any thoughts on this?

Activity

jreback

jreback commented on Jan 4, 2016

@jreback
ContributorAuthor
ResidentMario

ResidentMario commented on Mar 8, 2017

@ResidentMario
Contributor

xref. #15533

@jreback I think this keyword would be a 👍. This would be a way of harmonizing the for/against validating forcefully/weakly that are under discussion at PR#15587. Once that PR is added, this behavior could presumably be added as a single if errors == 'raise': validate_fill_value(obj, value) call.

I think it's worth considering adding similar behavior to methods implementing fill_value. I'm not sure I like that idea, it feels like a lot of API overhead, but, worth considering.

mroeschke

mroeschke commented on Apr 21, 2021

@mroeschke
Member

This behavior no longer coerces to object. I supposed it could use a test orthoganal to the enhancement request

In [34]: In [35]: df = DataFrame({'A' : Series(dtype='M8[ns]'), 'B' : Series([np.nan],dtype='object'), 'C' : [
    ...: 'foo'], 'D' : [1]})

In [35]: In [38]: df.loc[0,'D'] = 1.0

In [36]: df.dtypes
Out[36]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [37]: In [40]: df.fillna('')
Out[37]:
    A B    C  D
0 NaT    foo  1

In [38]: In [41]: df.fillna('').dtypes
Out[38]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [39]: pd.__version__
Out[39]: '1.3.0.dev0+1383.g855696cde0'
added
Needs TestsUnit test(s) needed to prevent regressions
and removed
Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
on Apr 21, 2021
mroeschke

mroeschke commented on May 12, 2021

@mroeschke
Member

Actually I think this is a bug and the original behavior was correct. NaT is a "na value" that wasn't replaced by empty string

In [1]: df = DataFrame({'A': Series(dtype='M8[ns]'), 'B': Series([np.nan], dtype='object'), 'C': ['foo'], 'D': [1]})

In [2]: df.fillna('')
Out[2]:
    A B    C  D
0 NaT    foo  1

In [3]: df.fillna('').dtypes
Out[3]:
A    datetime64[ns]
B            object
C            object
D             int64
dtype: object

In [4]: df.fillna(2).dtypes
Out[4]:
A     int64
B     int64
C    object
D     int64
dtype: object

In [5]: df.fillna(2)
Out[5]:
   A  B    C  D
0  2  2  foo  1
added
DatetimeDatetime data dtype
and removed
Needs TestsUnit test(s) needed to prevent regressions
on May 12, 2021

2 remaining items

eirkkr

eirkkr commented on Sep 21, 2021

@eirkkr

Hello, just to add to this thread. I have encountered this bug when upgrading pandas from 1.2.5 to 1.3.3 (it looks like this bug was introduced in version 1.3.0).

When using fillna or replace on a datetime series, converting to empty string "" will not work. However, when using another string e.g. "hello" it will work, and coerce the series to object type. Also interestingly, df.replace({pd.NaT: ""}) has different behaviour to df.replace(pd.NaT, "")

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [pd.NaT]})

In [3]: df.fillna("")
Out[3]:
    A
0 NaT

In [4]: df.fillna("hello")
Out[4]:
       A
0  hello

In [5]: df.replace(pd.NaT, "")
Out[5]:
    A
0 NaT

In [6]: df.replace(pd.NaT, "hello")
Out[6]:
       A
0  hello

In [7]: df.replace({pd.NaT, ""})
Out[7]:
    A
0 NaT

In [8]: df.replace({pd.NaT, "hello"})
Out[8]:
    A
0 NaT
AvivAvital2

AvivAvital2 commented on Oct 26, 2021

@AvivAvital2

Also reproduced on 1.3.4

yeyeric

yeyeric commented on May 2, 2022

@yeyeric

same here on latest 1.4.2, pd.fillna('') doesn't work with NaT (pd.isnull() gives True though)

pd.fillna('something') works...

Very surpising it has been here since 2016 ?

evelynegroen

evelynegroen commented on Jul 19, 2022

@evelynegroen

same on version 1.4.3, df = pd.DataFrame({"A": [pd.NaT]}), df.fillna("") will do nothing, df.fillna(" ") will replace NaT with a blank space.

Supertramplee

Supertramplee commented on Aug 10, 2022

@Supertramplee

same here, NaT still shows if fill na with empty string df.fillna('')

mroeschke

mroeschke commented on Aug 10, 2022

@mroeschke
Member

The core issue here appears to be specifically because the Timestamp constructor interprets empty string as pd.NaT and therefore the datetime64 type is not upcast to object

In [8]: pd.Timestamp("")
Out[8]: NaT

In [9]: pd.Timestamp(" ")
ValueError: could not convert string to Timestamp

If the behavior of Out[8] was deprecated to not return NaT then this behavior would probably be fixed

Masumi-M

Masumi-M commented on Dec 8, 2022

@Masumi-M

This might be the temporary measure 👍

# 1. convert datetime to string
df["target"] = df["target"].dt.strftime('%Y-%m-%d %H:%M:%S')

# 2. fillna
replace_datetime_in_str = "2023-01-01 00:00:00"
df["target"] = df["target"].fillna(replace_dt)

# 3. convert string to datetime
df["target"] = pd.to_datetime(df["target"])
ciscorucinski

ciscorucinski commented on May 27, 2023

@ciscorucinski

I'm a novice, but it seems to still be present in 2.0.1

msingh0101

msingh0101 commented on Oct 7, 2023

@msingh0101

I'm a novice, but it seems to still be present in 2.0.1

still present

baptiste-pasquier

baptiste-pasquier commented on Oct 25, 2023

@baptiste-pasquier

There is also a bug when replacing with the string "NAN" :

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"A": [pd.NaT]})

In [3]: df.fillna("")
Out[3]: 
    A
0 NaT

In [4]: df.fillna("hello")
Out[4]: 
       A
0  hello

In [5]: df.fillna("NAN")
Out[5]: 
    A
0 NaT

In [6]: df.fillna("NAN_")
Out[6]: 
      A
0  NAN_
Pesec1

Pesec1 commented on Dec 12, 2024

@Pesec1

df.fillna(pd.NA) was not filling np.nan and pd.NaT, but if I would check there values for missing it would comeback as true.
this fixed it for me df.fillna(pd.NA, axis=1) .
I have pandas version 2.2.3 and python3 version 3.12.7.

hewliyang

hewliyang commented on Jan 7, 2025

@hewliyang

df.fillna(pd.NA) was not filling np.nan and pd.NaT, but if I would check there values for missing it would comeback as true. this fixed it for me df.fillna(pd.NA, axis=1) . I have pandas version 2.2.3 and python3 version 3.12.7.

in contrast, filling on axis=0 still does not work (it should) due to the same reason (dtype conflicts)

Annam679

Annam679 commented on Feb 4, 2025

@Annam679

df.fillna(pd.NA) was not filling np.nan and pd.NaT, but if I would check there values for missing it would comeback as true. this fixed it for me df.fillna(pd.NA, axis=1) . I have pandas version 2.2.3 and python3 version 3.12.7.

in contrast, filling on axis=0 still does not work (it should) due to the same reason (dtype conflicts)

So for some reason this worked:
df,fillna('')
df.fillna(pd.NaT, axis=1)
df.fillna('')

literally without the second fillna its not working but with all three its working

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDatetimeDatetime data dtypeMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      Participants

      @jreback@ResidentMario@ciscorucinski@Supertramplee@mroeschke

      Issue actions

        BUG: fillna('') does not replace NaT · Issue #11953 · pandas-dev/pandas