Skip to content

BUG: Can only compare identically-labeled Series objects (string vs. object) #61099

Closed
@wahsmail

Description

@wahsmail

Pandas version checks

  • I have checked that this issue has not already been reported.

    I have confirmed this bug exists on the latest version of pandas.

    I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
s2.index = s2.index.astype('string')

s1 < s2  # fails

s1, s2 = s1.align(s2)
s1 < s2  # also fails

s1 = s1.reindex(s2.index)
s1 < s2  # succeeds

Issue Description

When a series (or dataframe) with otherwise identical indices are compared, but the indexes are technically dtype(object) and dtype(string), element-wise comparison fails. In the debugger, it looks like the ExtensionArray StringArray.equals is False when comparing to a python list of strings, causing Series._indexed_same to return False.

Expected Behavior

Ideally the string and object dtype would be comparable. This in-between state for Pandas dtypes has been quite awkward, with some libraries porting over to numpy-nullable / pyarrow dtype backends, but the Pandas library defaults not using them yet.

Installed Versions

Replace this line with the output of pd.show_versions()

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Mar 10, 2025
added
StringsString extension data type and string data
Needs DiscussionRequires discussion from core team before further action
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Mar 11, 2025
added this to the 2.3 milestone on Mar 14, 2025
sanggon6107

sanggon6107 commented on Mar 17, 2025

@sanggon6107
Contributor

Hi @wahsmail,
I think this should work since Index.equals() doc stated that dtype is not compared.

The dtype is *not* compared
>>> int64_idx = pd.Index([1, 2, 3], dtype="int64")
>>> int64_idx
Index([1, 2, 3], dtype='int64')
>>> uint64_idx = pd.Index([1, 2, 3], dtype="uint64")
>>> uint64_idx
Index([1, 2, 3], dtype='uint64')
>>> int64_idx.equals(uint64_idx)
True
"""

Also confirmed that the comparison doesn't raise when Index.equals() inside the Series._indexed_same() returns True.

sanggon6107

sanggon6107 commented on Mar 17, 2025

@sanggon6107
Contributor

take

MayurKishorKumar

MayurKishorKumar commented on Mar 29, 2025

@MayurKishorKumar

take

MayurKishorKumar

MayurKishorKumar commented on Apr 1, 2025

@MayurKishorKumar

Hi @rhshadrach 👋

I’m working on fixing [https://github.com//issues/61099] and ran into a failure in test_mixed_col_index_dtype.

My fix updates Index.equals so that StringDtype and object dtypes are treated as equivalent when comparing column indexes. As a result, this test now fails because result.columns.dtype becomes "string" while expected.columns.dtype remains object.

There are two options I’m considering:

Update the test to explicitly cast expected.columns to "string" when using_infer_string=True, so it reflects the result.
Adjust internal logic so the result stays object, but that might go against the spirit of treating string/object as equal.
Would updating the test be acceptable in this case?

Thanks!

rhshadrach

rhshadrach commented on Apr 6, 2025

@rhshadrach
Member

@MayurKishorKumar - in that test I'm seeing that when using_infer_string=True, the expected is being explicitly cast to non-object.

if using_infer_string:
# df2.columns.dtype will be "str" instead of object,
# so the aligned result will be "string", not object
if HAS_PYARROW:
dtype = "string[pyarrow]"
else:
dtype = "string"
expected.columns = expected.columns.astype(dtype)

So I don't see how expected.columns.dtype remains object. It might be helpful to put up your PR as a draft.

sanggon6107

sanggon6107 commented on May 22, 2025

@sanggon6107
Contributor

Hi @MayurKishorKumar , are you still working on this? I would like to contribute if you're not.

modified the milestones: 2.3, 3.0 on Jun 2, 2025
added
IndexRelated to the Index class or subclasses
and removed
Needs DiscussionRequires discussion from core team before further action
on Jun 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Labels

BugIndexRelated to the Index class or subclassesStringsString extension data type and string data

Type

No type

Projects

No projects

Relationships

None yet

    Participants

    @jorisvandenbossche@mroeschke@wahsmail@rhshadrach@MayurKishorKumar

    Issue actions

      BUG: Can only compare identically-labeled Series objects (string vs. object) · Issue #61099 · pandas-dev/pandas