Skip to content

BUG: fillna with DataFrame input should preserve dtype when possible #61742

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iabhi4
Copy link
Contributor

@iabhi4 iabhi4 commented Jun 29, 2025

When filling a DataFrame with another DataFrame using fillna, columns with matching dtypes were being unnecessarily cast to object due to use of np.where.

This PR updates the logic to use pandas’ Series.where, which is dtype-safe and respects extension and datetime types.

@iabhi4
Copy link
Contributor Author

iabhi4 commented Jun 29, 2025

Since we now operate column-wise and use Series.where instead of np.where, so it keeps dtype safety as suggested by @jbrockmendel

This also preserves extension dtypes like string[pyarrow], which used to get cast to object. Because of that, test_fillna_dataframe_preserves_dtypes_mixed_columns is failing since it expects the downgraded dtype.

Let me know if this behavior change is fine, happy to update the test or tweak the logic based on what’s preferred!

@simonjayhawkins simonjayhawkins added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Jun 30, 2025
# restore original dtype if fallback to object occurred
if lhs.dtype == rhs.dtype and filled.dtype == object:
try:
filled = filled.astype(lhs.dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id expect this to be handled by Series.where. is it not?

@@ -7145,7 +7145,24 @@ def fillna(
else:
new_data = self._mgr.fillna(value=value, limit=limit, inplace=inplace)
elif isinstance(value, ABCDataFrame) and self.ndim == 2:
new_data = self.where(self.notna(), value)._mgr
filled_columns = {}
for col in self.columns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doing this column-by-column is going to mean a performance hit for non-object cases. i suspect we need to do this at the Block level in order to avoid that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Inconsistent behavior surrounding pd.fillna
3 participants