Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Fix return type of loc/iloc #61054

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sanggon6107
Copy link

Description of Linked Issue

loc/iloc inconsistently returns dtype. For example,

>>> import pandas as pd
>>> df = pd.DataFrame([['a', 1., 2.], ['b', 3., 4.]])

>>> df.loc[0, [1, 2]]
1    1.0
2    2.0
Name: 0, dtype: object

>>> df[[1, 2]].loc[0]
1    1.0
2    2.0
Name: 0, dtype: float64

>>> df.loc[[0, 1], 1]
0    1.0
1    3.0
Name: 1, dtype: float64

This behaviour seems to happen following the below sequence:

  1. For axis=0, BlockManager.fast_xs() returns a cross-section of df, determining the dtype as object, since df.loc[0,:] is supposed to include 'a'.
  2. For axis=1, NDFrame._reindex_with_indexers() returns the result, not additionally inferring the dtype of the result.

Proposed Solution

Based on the above examples, we can conclude that this issue only apprears where axis[0]=int amd axis[1]=list/slice - loc[int/slice].
Therefore, I'd like to propose to add the below codes to additionally infer the dtype after the column selection.

    @final
    def _getitem_lowerdim(self, tup: tuple):

...

                # This is an elided recursive call to iloc/loc
                out = getattr(section, self.name)[new_key]
                # Re-interpret dtype of out.values for loc/iloc[int, list/slice].
                # GH60600
                if (
                    i == 0
                    and isinstance(key, int)
                    and isinstance(new_key, (list, slice))
                ):
                    out = out.infer_objects()
                return out

Thanks!

@mroeschke mroeschke requested a review from jbrockmendel March 5, 2025 18:18
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

and isinstance(key, int)
and isinstance(new_key, (list, slice))
):
out = out.infer_objects()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will infer non-object dtype even on data that is object-dtype, no? I do not think this is the right change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rhshadrach !
I've also confirmed what you've mentioned.

>>>import pandas as pd
>>> df = pd.DataFrame({1: [11, 22], 2: [33, 44], "a": [55, 66]}, dtype=object)

>>>df.loc[0, [1, 2]]
1	11
2	33
Name: 0, dtype: int64

>>>df[[1, 2]].loc[0]
1	11
2	33
Name: 0, dtype: object

I'll do some further investigation to fix this.

@jbrockmendel
Copy link
Member

I think the way to do it would be by changing getitem_lowerdim to reverse the order in which we do the indexing (i think the loop on L1069). Would need to try it to see if that breaks anything. Best guess would be MultiIndex cases (possibly their perf?)

I think a bunch of getitem_lowerdim dates back to when Panel and PanelND existed, so it may be more general than it needs to be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: loc[] returns object type instead of float
3 participants