Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Series dtype changes when a new row is added #21501

Closed
asdf8601 opened this issue Jun 15, 2018 · 7 comments
Closed

Series dtype changes when a new row is added #21501

asdf8601 opened this issue Jun 15, 2018 · 7 comments
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate setitem-with-expansion

Comments

@asdf8601
Copy link
Contributor

The problem

If we create a Series with a defined dtype and then a new row is added into that Series the dtype changes. I have left an example below:

Example

import pandas as pd
import numpy as np

pd.__version__  # '0.23.1'

# with Series
s = pd.Series([1,2], dtype=np.float64)
print(s.dtype)  # -> float64
s[3] = None
print(s.dtype)  # -> object

# with DataFrames
d = pd.DataFrame([1,2], dtype=np.float64)
print(d.dtypes)  # -> float64
d.loc[3, 0] = None
print(d.dtypes)  # -> float64

However, this doesn't happen when the row is already present:

In[12]: s = pd.Series([1,2]).astype(np.float64)
In[13]: s[3] = None
In[14]: s
Out[14]: 
0       1
1       2
3    None
dtype: object

In[15]: s = s.astype(np.float64)
In[16]: s
Out[16]: 
0    1.0
1    2.0
3    NaN
dtype: float64

# row 3 (position 2) is already present in s
In[18]: s.iloc[2] = None
In[19]: s
Out[19]: 
0    1.0
1    2.0
3    NaN
dtype: float64

In[20]: s.loc[3] = None
In[21]: s
Out[21]: 
0    1.0
1    2.0
3    NaN
dtype: float64
@WillAyd
Copy link
Member

WillAyd commented Jun 15, 2018

None is not a valid float64 value, hence the coercion to object. If you wanted to preserve that dtype you should be inserting np.nan

@WillAyd WillAyd closed this as completed Jun 15, 2018
@WillAyd WillAyd reopened this Jun 15, 2018
@WillAyd
Copy link
Member

WillAyd commented Jun 15, 2018

Sorry misunderstood this on first glance. I see now that the append of None coerces to object but assigning to an existing value preserves type, automatically converting None to np.nan.

Investigation and PRs into the behavior are certainly welcome

@WillAyd WillAyd added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Dtype Conversions Unexpected or buggy dtype conversions labels Jun 15, 2018
@asdf8601
Copy link
Contributor Author

@WillAyd Don't worry, my fault, I should have gone into more detail.

What would be the expected behavior?

My opinion:

On the one hand, DataFrame and Series behave differently when they perform the same operation, I refer to the first example above: If I insert None in a new field in the DataFrame the dtype is maintained while in the Series it does not.

On the other hand, if the field already exists, both (Series and DataFrame) behave the same, turning None into NaN preserving dtype=float64.

From your first answer I understand that the first is what is expected and the second is not, that is, if a field exists and the content is changed to None, the dtype should change to object?

@WillAyd
Copy link
Member

WillAyd commented Jun 16, 2018

I think any insert or assignment of None should coerce to object. If the user wanted to maintain float64 then they should have used np.nan instead, so that its not up to pandas to make that inference.

cc @TomAugspurger @jreback if they disagree

@jreback
Copy link
Contributor

jreback commented Jun 16, 2018

there is an issue about this already iirc

@asdf8601
Copy link
Contributor Author

asdf8601 commented Jun 19, 2018

Probably this issue: #20442 (comment)
I found it relevant.

@rhshadrach
Copy link
Member

All of these now return float64 on main. There has been recent work in this regard (e.g. PDEP-6), I don't believe we need a test. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate setitem-with-expansion
Projects
None yet
Development

No branches or pull requests

6 participants