You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
deftest_pandas_datetime_index_union(self):
""" Demonstrates a suspected bug in pandas 2.2.3, where unions of DatetimeIndexes (and therefore pd.concats of dataframes with DatetimeIndexes) are returning unexpected values. My actual usecase (concatenating two dataframes with these DatetimeIndexes, from which I extracted these date ranges) works in pandas 1.5.3, but not 2.2.3. Interestingly, this passes in both versions if you change the dtype to datetime64[ns]. """dti1=DatetimeIndex(
['2021-10-05 17:30:00',
'2021-10-05 18:00:00',
'2021-10-05 18:30:00',
'2021-10-05 19:00:00',
'2021-10-05 19:30:00'],
dtype='datetime64[us]', name='DATETIME', freq='30min'
)
dti2=DatetimeIndex(
['2021-10-05 17:30:00',
'2021-10-05 18:00:00',
'2021-10-05 18:30:00',
'2021-10-05 19:00:00',
'2021-10-05 19:30:00',
'2021-10-05 20:00:00'], # <-- Extra datetimedtype='datetime64[us]', name='DATETIME', freq='30min'
)
union=set(dti1.union(dti2))
expected=set(dti1) |set(dti2)
print(f"{union=}")
print(f"{expected=}")
assertlen(union) ==len(expected), "Should have all the rows from the concatenated dataframes"deftest_range_index_equality(self):
""" This (presumably) faulty equality check appears to be the root cause of the datetimeindex union bug above Note that the two stop values are different, so the RangeIndexes should not be equal. Interestingly, this fails in both pandas 1.5.3 and 2.2.3. """a=RangeIndex(start=1633455000000000, stop=1635262200000000, step=1800000000000)
b=RangeIndex(start=1633455000000000, stop=1635264000000000, step=1800000000000)
assertnota.equals(b)
Issue Description
These tests above (details in the function doc) demonstrate the issue and what I think is the root cause.
Basically we get back what appears to be an incorrect result when taking the union of two DatetimeIndexes with different ranges.
I traced this as far as the RangeIndex equality check in the second test, which appears to be faulty, returning True for two different stop values.
Expected Behavior
Out from first test should be (as in pandas 1.5.3):
Update: I think pandas.core.indexes.datetimelike._as_range_index() is the real problem here: it always converts the freq to nanoseconds, but passes the first and last timestamps' _values through verbatim.
So if the timestamps are in us, we end up with a range with us and step in ns, which in my case, means that the step is so large that the ranges are actually equivalent (because the first step goes past the from param.
So I think _as_range_index() probably needs to be made aware of the time unit somehow, and not always assume ns precision. The 2-line change below worked for me, but there's probably a more robust way to do this check:
In pandas.core.indexes.datetimelike._as_range_index()
def_as_range_index(self) ->RangeIndex:
# Convert our i8 representations to RangeIndex# Caller is responsible for checking isinstance(self.freq, Tick)freq=cast(Tick, self.freq)
time_unit='ns'ifself._values._ndarray.dtype.name=='datetime64[ns]'else'us'# check if we have ns or us valuestick=Timedelta(freq).as_unit(time_unit)._valuerng=range(self[0]._value, self[-1]._value+tick, tick)
returnRangeIndex(rng)
rhshadrach
added
setops
union, intersection, difference, symmetric_difference
Non-Nano
datetime64/timedelta64 with non-nanosecond resolution
and removed
Needs Triage
Issue that has not been reviewed by a pandas team member
labels
Mar 22, 2025
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
These tests above (details in the function doc) demonstrate the issue and what I think is the root cause.
Basically we get back what appears to be an incorrect result when taking the union of two DatetimeIndexes with different ranges.
I traced this as far as the RangeIndex equality check in the second test, which appears to be faulty, returning True for two different
stop
values.Expected Behavior
Out from first test should be (as in pandas 1.5.3):
But the actual output in pandas 2.2.3 is (incorrectly):
Installed Versions
The text was updated successfully, but these errors were encountered: