Skip to content

Fix #60766:.map,.apply would convert element type for extension array #61396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

pedromfdiogo
Copy link

@pedromfdiogo pedromfdiogo commented May 3, 2025

The Int32Dtype type allows representing integers with support for null values (pd.NA). However, when using .map(f) or .apply(f), the elements passed to f are converted to float64, and pd.NA is transformed into np.nan.

This happens because .map() and .apply() internally use numpy, which automatically converts the data to float64, even when the original type is Int32Dtype.

The fix (just remove the method to_numpy()) ensures that when using .map() or .apply(), the elements in the series retain their original type (Int32, Float64, boolean, etc.), preventing unnecessary conversions to float64 and ensuring that pd.NA remains correctly handled.

…sion array.

The Int32Dtype type allows representing integers with support for null
values (pd.NA). However, when using .map(f) or .apply(f), the elements
passed to f are converted to float64, and pd.NA is transformed into
np.nan.

This happens because .map() and .apply() internally use numpy, which
automatically converts the data to float64, even when the original type is
Int32Dtype.

The fix (just remove the method to_numpy()) ensures that when using
.map() or .apply(), the elements in the series retain their original type
(Int32, Float64, boolean, etc.), preventing unnecessary conversions to
float64 and ensuring that pd.NA remains correctly handled.
@@ -0,0 +1,18 @@
import pandas as pd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we have this in an existing tests file?

- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)
- Bug in various :class:`DataFrame` reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (:issue:`59234`)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remove this new line please?

Comment on lines +5 to +7
for dtype, data, expected_data in [
("Int32", [1, 2, None, 4], [2, 3, pd.NA, 5]),
]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to loop over a single value?

result = s.map(transform)
expected = pd.Series(expected_data, dtype=result.dtype)

assert result.tolist() == expected.tolist()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why not to compare the Series directly instead of converting them to lists? You can check other tests, there is a function assert_series_equal in case you're not aware.

Comment on lines +195 to +198
for i in range(len(result)):
if result[i] is pd.NA:
result[i] = "nan"
result = result.astype("float64")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you changing the result to match the expected value? Why not change the expected value if what you are proposing here is that?

@@ -181,10 +187,15 @@ def test_map(self, data_missing, na_action):
def test_map_na_action_ignore(self, data_missing_for_sorting):
zero = data_missing_for_sorting[2]
result = data_missing_for_sorting.map(lambda x: zero, na_action="ignore")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to avoid this unrelated changes

@datapythonista datapythonista added Bug Apply Apply, Aggregate, Transform, Map labels Jun 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: .map & .apply would convert element type for extension array.
2 participants