-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Fix #60766:.map,.apply would convert element type for extension array #61396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…sion array. The Int32Dtype type allows representing integers with support for null values (pd.NA). However, when using .map(f) or .apply(f), the elements passed to f are converted to float64, and pd.NA is transformed into np.nan. This happens because .map() and .apply() internally use numpy, which automatically converts the data to float64, even when the original type is Int32Dtype. The fix (just remove the method to_numpy()) ensures that when using .map() or .apply(), the elements in the series retain their original type (Int32, Float64, boolean, etc.), preventing unnecessary conversions to float64 and ensuring that pd.NA remains correctly handled.
@@ -0,0 +1,18 @@ | |||
import pandas as pd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we have this in an existing tests file?
- Bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return ``False`` for array-likes (:issue:`57055`) | ||
- Bug in comparison between object with :class:`ArrowDtype` and incompatible-dtyped (e.g. string vs bool) incorrectly raising instead of returning all-``False`` (for ``==``) or all-``True`` (for ``!=``) (:issue:`59505`) | ||
- Bug in constructing pandas data structures when passing into ``dtype`` a string of the type followed by ``[pyarrow]`` while PyArrow is not installed would raise ``NameError`` rather than ``ImportError`` (:issue:`57928`) | ||
- Bug in various :class:`DataFrame` reductions for pyarrow temporal dtypes returning incorrect dtype when result was null (:issue:`59234`) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you remove this new line please?
for dtype, data, expected_data in [ | ||
("Int32", [1, 2, None, 4], [2, 3, pd.NA, 5]), | ||
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why to loop over a single value?
result = s.map(transform) | ||
expected = pd.Series(expected_data, dtype=result.dtype) | ||
|
||
assert result.tolist() == expected.tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why not to compare the Series directly instead of converting them to lists? You can check other tests, there is a function assert_series_equal in case you're not aware.
for i in range(len(result)): | ||
if result[i] is pd.NA: | ||
result[i] = "nan" | ||
result = result.astype("float64") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you changing the result to match the expected value? Why not change the expected value if what you are proposing here is that?
@@ -181,10 +187,15 @@ def test_map(self, data_missing, na_action): | |||
def test_map_na_action_ignore(self, data_missing_for_sorting): | |||
zero = data_missing_for_sorting[2] | |||
result = data_missing_for_sorting.map(lambda x: zero, na_action="ignore") | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to avoid this unrelated changes
.map
&.apply
would convert element type for extension array. #60766doc/source/whatsnew/v3.0.0.rst
file if fixing a bug or adding a new feature.The Int32Dtype type allows representing integers with support for null values (pd.NA). However, when using .map(f) or .apply(f), the elements passed to f are converted to float64, and pd.NA is transformed into np.nan.
This happens because .map() and .apply() internally use numpy, which automatically converts the data to float64, even when the original type is Int32Dtype.
The fix (just remove the method to_numpy()) ensures that when using .map() or .apply(), the elements in the series retain their original type (Int32, Float64, boolean, etc.), preventing unnecessary conversions to float64 and ensuring that pd.NA remains correctly handled.