Fix #60766:.map,.apply would convert element type for extension array #61396

pedromfdiogo · 2025-05-03T21:54:02Z

closes BUG: .map & .apply would convert element type for extension array. #60766
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/v3.0.0.rst file if fixing a bug or adding a new feature.

The Int32Dtype type allows representing integers with support for null values (pd.NA). However, when using .map(f) or .apply(f), the elements passed to f are converted to float64, and pd.NA is transformed into np.nan.

This happens because .map() and .apply() internally use numpy, which automatically converts the data to float64, even when the original type is Int32Dtype.

The fix (just remove the method to_numpy()) ensures that when using .map() or .apply(), the elements in the series retain their original type (Int32, Float64, boolean, etc.), preventing unnecessary conversions to float64 and ensuring that pd.NA remains correctly handled.

…sion array. The Int32Dtype type allows representing integers with support for null values (pd.NA). However, when using .map(f) or .apply(f), the elements passed to f are converted to float64, and pd.NA is transformed into np.nan. This happens because .map() and .apply() internally use numpy, which automatically converts the data to float64, even when the original type is Int32Dtype. The fix (just remove the method to_numpy()) ensures that when using .map() or .apply(), the elements in the series retain their original type (Int32, Float64, boolean, etc.), preventing unnecessary conversions to float64 and ensuring that pd.NA remains correctly handled.

datapythonista · 2025-06-03T07:16:15Z

pandas/tests/arrays/masked/test_basemaskedarray_map.py

@@ -0,0 +1,18 @@
+import pandas as pd


Can't we have this in an existing tests file?

datapythonista · 2025-06-03T07:16:28Z

doc/source/whatsnew/v3.0.0.rst

 - Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`)
 - Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`)
 - Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`)
 - Bug in various :class:`DataFrame` reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (:issue:`59234`)

+


Can you remove this new line please?

datapythonista · 2025-06-03T07:16:50Z

pandas/tests/arrays/masked/test_basemaskedarray_map.py

+    for dtype, data, expected_data in [
+        ("Int32", [1, 2, None, 4], [2, 3, pd.NA, 5]),
+    ]:


Why to loop over a single value?

datapythonista · 2025-06-03T07:18:29Z

pandas/tests/arrays/masked/test_basemaskedarray_map.py

+        result = s.map(transform)
+        expected = pd.Series(expected_data, dtype=result.dtype)
+
+        assert result.tolist() == expected.tolist()


Any reason why not to compare the Series directly instead of converting them to lists? You can check other tests, there is a function assert_series_equal in case you're not aware.

datapythonista · 2025-06-03T07:22:38Z

pandas/tests/extension/test_masked.py

+            for i in range(len(result)):
+                if result[i] is pd.NA:
+                    result[i] = "nan"
+            result = result.astype("float64")


Are you changing the result to match the expected value? Why not change the expected value if what you are proposing here is that?

datapythonista · 2025-06-03T07:23:05Z

pandas/tests/extension/test_masked.py

@@ -181,10 +187,15 @@ def test_map(self, data_missing, na_action):
    def test_map_na_action_ignore(self, data_missing_for_sorting):
        zero = data_missing_for_sorting[2]
        result = data_missing_for_sorting.map(lambda x: zero, na_action="ignore")
+


Better to avoid this unrelated changes

pedromfdiogo added 5 commits April 22, 2025 16:03

Update v3.0.0.rst

bf6aaef

fixed test_masked.py

e8edcea

Apply Ruff and Ruff-format auto-fixes

d845306

Merge branch 'main' into bug#60766

ef9812e

datapythonista reviewed Jun 3, 2025

View reviewed changes

datapythonista added Bug Apply labels Jun 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix #60766:.map,.apply would convert element type for extension array #61396

Fix #60766:.map,.apply would convert element type for extension array #61396

pedromfdiogo commented May 3, 2025 •

edited

Loading

Uh oh!

datapythonista Jun 3, 2025

Uh oh!

datapythonista Jun 3, 2025

Uh oh!

datapythonista Jun 3, 2025

Uh oh!

datapythonista Jun 3, 2025

Uh oh!

datapythonista Jun 3, 2025

Uh oh!

datapythonista Jun 3, 2025

Uh oh!

Uh oh!

Uh oh!

Fix #60766:.map,.apply would convert element type for extension array #61396

Are you sure you want to change the base?

Fix #60766:.map,.apply would convert element type for extension array #61396

Conversation

pedromfdiogo commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datapythonista Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

datapythonista Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

datapythonista Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

datapythonista Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

datapythonista Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

datapythonista Jun 3, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pedromfdiogo commented May 3, 2025 •

edited

Loading