Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'c'])
s2.index = s2.index.astype('string')
s1 < s2 # fails
s1, s2 = s1.align(s2)
s1 < s2 # also fails
s1 = s1.reindex(s2.index)
s1 < s2 # succeeds
Issue Description
When a series (or dataframe) with otherwise identical indices are compared, but the indexes are technically dtype(object) and dtype(string), element-wise comparison fails. In the debugger, it looks like the ExtensionArray StringArray.equals is False when comparing to a python list of strings, causing Series._indexed_same to return False.
Expected Behavior
Ideally the string and object dtype would be comparable. This in-between state for Pandas dtypes has been quite awkward, with some libraries porting over to numpy-nullable / pyarrow dtype backends, but the Pandas library defaults not using them yet.
Installed Versions
Replace this line with the output of pd.show_versions()
Activity
sanggon6107 commentedon Mar 17, 2025
Hi @wahsmail,
I think this should work since
Index.equals()
doc stated that dtype is not compared.pandas/pandas/core/indexes/base.py
Lines 5453 to 5463 in 5d9cf43
Also confirmed that the comparison doesn't raise when
Index.equals()
inside theSeries._indexed_same()
returnsTrue
.sanggon6107 commentedon Mar 17, 2025
take
MayurKishorKumar commentedon Mar 29, 2025
take
MayurKishorKumar commentedon Apr 1, 2025
Hi @rhshadrach 👋
I’m working on fixing [https://github.com//issues/61099] and ran into a failure in test_mixed_col_index_dtype.
My fix updates Index.equals so that StringDtype and object dtypes are treated as equivalent when comparing column indexes. As a result, this test now fails because result.columns.dtype becomes "string" while expected.columns.dtype remains object.
There are two options I’m considering:
Update the test to explicitly cast expected.columns to "string" when using_infer_string=True, so it reflects the result.
Adjust internal logic so the result stays object, but that might go against the spirit of treating string/object as equal.
Would updating the test be acceptable in this case?
Thanks!
rhshadrach commentedon Apr 6, 2025
@MayurKishorKumar - in that test I'm seeing that when
using_infer_string=True
, the expected is being explicitly cast to non-object.pandas/pandas/tests/frame/test_arithmetic.py
Lines 2193 to 2200 in 5736b96
So I don't see how
expected.columns.dtype
remains object. It might be helpful to put up your PR as a draft.sanggon6107 commentedon May 22, 2025
Hi @MayurKishorKumar , are you still working on this? I would like to contribute if you're not.