feat(timeseries): add idxmin / idxmax to TimeSeries#3115
Conversation
`TimeSeries.min(axis=0)` and `.max(axis=0)` return a single-row series whose timestamp is just the *first* entry of the original time index, not the timestamp of the actual min / max value. The original issue documents how this is misleading; the maintainer has agreed (unit8co#2696) that adding `idxmin` / `idxmax` is the cleanest fix. This commit: - Adds `TimeSeries.idxmin()` and `TimeSeries.idxmax()` returning a `pandas.Series` indexed by component name with values being the time-index entries (Timestamp or int) at which each component attains its min / max — mirroring `pandas.DataFrame.idxmin/idxmax` semantics. This sidesteps the multivariate dilemma flagged in the issue (each component can have a different argmin/argmax) cleanly and is purely additive (no breaking change). - For stochastic series we reduce samples with the median first so the returned index does not depend on `n_samples`. - Adds a one-line note to `min()` / `max()` docstrings pointing readers to the new helpers. - Adds 4 regression tests covering the univariate, multivariate, RangeIndex, and stochastic cases. The univariate test fails on master (`AttributeError: 'TimeSeries' object has no attribute 'idxmin'`) and passes on this branch. - CHANGELOG entry under Unreleased > Improved. Refs: unit8co#2696 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
jakubchlapek
left a comment
There was a problem hiding this comment.
Hey @jbbqqf, thanks for this PR.
The code looks fine, just had a few documentation changes I left in the comments. I also debated adding the axis parameter to the idxmin/max methods, but can't see a real usecase in which we wouldn't use multi series instead. You can let me know what you think.
| .. note:: | ||
| With ``axis=0`` the returned timestamp is the first entry of the | ||
| original ``time_index`` and **does not** correspond to the | ||
| timestamp of the actual minimum value. Use :func:`idxmin` to get | ||
| the timestamp at which each component attains its minimum. |
There was a problem hiding this comment.
i'd say this is redudnant with the rest of the docstring as the behavior is already mentioned above (`If we reduce over time (axis=0), the series will have length one and will use the first entry of the
original ``time_index```)
| .. note:: | ||
| With ``axis=0`` the returned timestamp is the first entry of the | ||
| original ``time_index`` and **does not** correspond to the | ||
| timestamp of the actual maximum value. Use :func:`idxmax` to get | ||
| the timestamp at which each component attains its maximum. | ||
|
|
There was a problem hiding this comment.
same case as above for min
|
|
||
| Useful as a companion to :func:`min` because ``min(axis=0)`` returns a | ||
| single-row series whose timestamp is the *first* time index entry of | ||
| the original series, not the entry of the actual minimum (see | ||
| `issue #2696 <https://github.com/unit8co/darts/issues/2696>`_). | ||
|
|
There was a problem hiding this comment.
i'd say this is unnecesssary
|
|
||
| Useful as a companion to :func:`max` because ``max(axis=0)`` returns a | ||
| single-row series whose timestamp is the *first* time index entry of | ||
| the original series, not the entry of the actual maximum (see | ||
| `issue #2696 <https://github.com/unit8co/darts/issues/2696>`_). | ||
|
|
| # Reduce samples first so the result is independent of stochasticity; | ||
| # using the median (rather than mean) keeps the returned index value | ||
| # an actual observed value when n_samples == 1. |
There was a problem hiding this comment.
unnecessary, already mentioned in the docstring
|
|
||
| **Improved** | ||
|
|
||
| - Added `TimeSeries.idxmin()` and `TimeSeries.idxmax()`, returning a `pandas.Series` (indexed by component) of the time index value at which each component attains its minimum / maximum. Also clarified the docstrings of `min()` / `max()` to point users to these new helpers when they want the actual argmin/argmax timestamp. Closes [#2696](https://github.com/unit8co/darts/issues/2696). |
There was a problem hiding this comment.
| - Added `TimeSeries.idxmin()` and `TimeSeries.idxmax()`, returning a `pandas.Series` (indexed by component) of the time index value at which each component attains its minimum / maximum. Also clarified the docstrings of `min()` / `max()` to point users to these new helpers when they want the actual argmin/argmax timestamp. Closes [#2696](https://github.com/unit8co/darts/issues/2696). | |
| - Added `TimeSeries.idxmin()` and `TimeSeries.idxmax()`, returning a `pandas.Series` (indexed by component) of the time index value at which each component attains its minimum / maximum. Closes [#3115](https://github.com/unit8co/darts/pull/3115). |
| def test_idxmin_idxmax_univariate_datetime(self): | ||
| # univariate, deterministic, datetime index — covers issue #2696 | ||
| # where TimeSeries.min(axis=0) returns the first timestamp instead of | ||
| # the timestamp of the actual minimum. | ||
| idx = pd.date_range("2020-01-01", periods=5, freq="D") | ||
| values = np.array([3.0, 1.0, 4.0, 0.0, 2.0]) | ||
| ts = TimeSeries(times=idx, values=values, components=["a"]) | ||
|
|
||
| # idxmin / idxmax return a pd.Series indexed by component name. | ||
| idxmin = ts.idxmin() | ||
| idxmax = ts.idxmax() | ||
| assert list(idxmin.index) == ["a"] | ||
| assert list(idxmax.index) == ["a"] | ||
| # The actual minimum is at position 3 (2020-01-04) and the maximum at | ||
| # position 2 (2020-01-03). This is the load-bearing assertion: it | ||
| # fails on master where users had to fall back to pd_dataframe() | ||
| # because TimeSeries provided no idx{min,max}. | ||
| assert idxmin["a"] == pd.Timestamp("2020-01-04") | ||
| assert idxmax["a"] == pd.Timestamp("2020-01-03") | ||
|
|
||
| def test_idxmin_idxmax_multivariate(self): | ||
| # different argmin/argmax per component | ||
| idx = pd.date_range("2020-01-01", periods=3, freq="D") | ||
| values = np.array([[1.0, 0.0], [0.0, 0.0], [0.0, 1.0]]) | ||
| ts = TimeSeries(times=idx, values=values, components=["a", "b"]) | ||
|
|
||
| idxmin = ts.idxmin() | ||
| idxmax = ts.idxmax() | ||
| # argmin returns first occurrence of the minimum, mirroring numpy. | ||
| assert idxmin["a"] == pd.Timestamp("2020-01-02") | ||
| assert idxmin["b"] == pd.Timestamp("2020-01-01") | ||
| assert idxmax["a"] == pd.Timestamp("2020-01-01") | ||
| assert idxmax["b"] == pd.Timestamp("2020-01-03") | ||
|
|
||
| def test_idxmin_idxmax_range_index(self): | ||
| # RangeIndex-based series should return integer indices. | ||
| values = np.array([5.0, 3.0, 8.0, 1.0]) | ||
| ts = TimeSeries( | ||
| times=pd.RangeIndex(start=10, stop=14, step=1), | ||
| values=values, | ||
| components=["x"], | ||
| ) | ||
| assert ts.idxmin()["x"] == 13 | ||
| assert ts.idxmax()["x"] == 12 | ||
|
|
||
| def test_idxmin_idxmax_stochastic(self): | ||
| # For a stochastic series we reduce samples with the median first so | ||
| # the answer does not depend on how many samples were drawn. | ||
| rng = np.random.default_rng(0) | ||
| idx = pd.date_range("2020-01-01", periods=4, freq="D") | ||
| # Component "a" has its median minimum at t=2. | ||
| median_target = np.array([5.0, 3.0, 1.0, 2.0]) | ||
| values = np.stack( | ||
| [median_target + rng.normal(0, 0.01, size=4) for _ in range(50)], | ||
| axis=-1, | ||
| )[:, None, :] | ||
| ts = TimeSeries(times=idx, values=values, components=["a"]) | ||
| assert ts.idxmin()["a"] == pd.Timestamp("2020-01-03") | ||
|
|
There was a problem hiding this comment.
Simplified the tests a bit :)
| def test_idxmin_idxmax_univariate_datetime(self): | |
| # univariate, deterministic, datetime index — covers issue #2696 | |
| # where TimeSeries.min(axis=0) returns the first timestamp instead of | |
| # the timestamp of the actual minimum. | |
| idx = pd.date_range("2020-01-01", periods=5, freq="D") | |
| values = np.array([3.0, 1.0, 4.0, 0.0, 2.0]) | |
| ts = TimeSeries(times=idx, values=values, components=["a"]) | |
| # idxmin / idxmax return a pd.Series indexed by component name. | |
| idxmin = ts.idxmin() | |
| idxmax = ts.idxmax() | |
| assert list(idxmin.index) == ["a"] | |
| assert list(idxmax.index) == ["a"] | |
| # The actual minimum is at position 3 (2020-01-04) and the maximum at | |
| # position 2 (2020-01-03). This is the load-bearing assertion: it | |
| # fails on master where users had to fall back to pd_dataframe() | |
| # because TimeSeries provided no idx{min,max}. | |
| assert idxmin["a"] == pd.Timestamp("2020-01-04") | |
| assert idxmax["a"] == pd.Timestamp("2020-01-03") | |
| def test_idxmin_idxmax_multivariate(self): | |
| # different argmin/argmax per component | |
| idx = pd.date_range("2020-01-01", periods=3, freq="D") | |
| values = np.array([[1.0, 0.0], [0.0, 0.0], [0.0, 1.0]]) | |
| ts = TimeSeries(times=idx, values=values, components=["a", "b"]) | |
| idxmin = ts.idxmin() | |
| idxmax = ts.idxmax() | |
| # argmin returns first occurrence of the minimum, mirroring numpy. | |
| assert idxmin["a"] == pd.Timestamp("2020-01-02") | |
| assert idxmin["b"] == pd.Timestamp("2020-01-01") | |
| assert idxmax["a"] == pd.Timestamp("2020-01-01") | |
| assert idxmax["b"] == pd.Timestamp("2020-01-03") | |
| def test_idxmin_idxmax_range_index(self): | |
| # RangeIndex-based series should return integer indices. | |
| values = np.array([5.0, 3.0, 8.0, 1.0]) | |
| ts = TimeSeries( | |
| times=pd.RangeIndex(start=10, stop=14, step=1), | |
| values=values, | |
| components=["x"], | |
| ) | |
| assert ts.idxmin()["x"] == 13 | |
| assert ts.idxmax()["x"] == 12 | |
| def test_idxmin_idxmax_stochastic(self): | |
| # For a stochastic series we reduce samples with the median first so | |
| # the answer does not depend on how many samples were drawn. | |
| rng = np.random.default_rng(0) | |
| idx = pd.date_range("2020-01-01", periods=4, freq="D") | |
| # Component "a" has its median minimum at t=2. | |
| median_target = np.array([5.0, 3.0, 1.0, 2.0]) | |
| values = np.stack( | |
| [median_target + rng.normal(0, 0.01, size=4) for _ in range(50)], | |
| axis=-1, | |
| )[:, None, :] | |
| ts = TimeSeries(times=idx, values=values, components=["a"]) | |
| assert ts.idxmin()["a"] == pd.Timestamp("2020-01-03") | |
| def test_idxmin_idxmax_univariate_datetime(self): | |
| idx = pd.date_range("2020-01-01", periods=5, freq="D") | |
| values = np.array([3.0, 1.0, 4.0, 0.0, 2.0]) | |
| ts = TimeSeries(times=idx, values=values, components=["a"]) | |
| idxmin = ts.idxmin() | |
| idxmax = ts.idxmax() | |
| assert list(idxmin.index) == ["a"] | |
| assert list(idxmax.index) == ["a"] | |
| # minimum at position 3 (2020-01-04), maximum at position 2 (2020-01-03) | |
| assert idxmin["a"] == pd.Timestamp("2020-01-04") | |
| assert idxmax["a"] == pd.Timestamp("2020-01-03") | |
| def test_idxmin_idxmax_multivariate(self): | |
| idx = pd.date_range("2020-01-01", periods=3, freq="D") | |
| values = np.array([[1.0, 0.0], [0.0, 0.0], [0.0, 1.0]]) | |
| ts = TimeSeries(times=idx, values=values, components=["a", "b"]) | |
| idxmin = ts.idxmin() | |
| idxmax = ts.idxmax() | |
| # first occurrence of the minimum, mirroring numpy | |
| assert idxmin["a"] == pd.Timestamp("2020-01-02") | |
| assert idxmin["b"] == pd.Timestamp("2020-01-01") | |
| assert idxmax["a"] == pd.Timestamp("2020-01-01") | |
| assert idxmax["b"] == pd.Timestamp("2020-01-03") | |
| def test_idxmin_idxmax_range_index(self): | |
| values = np.array([5.0, 3.0, 8.0, 1.0]) | |
| ts = TimeSeries( | |
| times=pd.RangeIndex(start=10, stop=14, step=1), | |
| values=values, | |
| components=["x"], | |
| ) | |
| assert ts.idxmin()["x"] == 13 | |
| assert ts.idxmax()["x"] == 12 | |
| def test_idxmin_idxmax_stochastic(self): | |
| idx = pd.date_range("2020-01-01", periods=3, freq="D") | |
| # sample 0 min at t=1, sample 1 min at t=2 — medians [3.0, 2.0, 1.5], min at t=2 | |
| values = np.array([[[2.0, 4.0]], [[1.0, 3.0]], [[3.0, 0.0]]]) # (3, 1, 2) | |
| ts = TimeSeries(times=idx, values=values, components=["a"]) | |
| assert ts.idxmin()["a"] == pd.Timestamp("2020-01-03") | |
Fixes #2696.
Summary
TimeSeries.min(axis=0)and.max(axis=0)return a single-row series whose timestamp is just the first entry of the original time index, not the timestamp of the actual minimum / maximum. As @eschibli pointed out in #2696, this is misleading — users may not read themin/maxdocstring before trusting the index. @madtoinou agreed in the same thread that the corresponding time-step is well-defined per component, even in the multivariate case.This PR adds
TimeSeries.idxmin()andTimeSeries.idxmax(), which return apandas.Seriesindexed by component name with values being the time-index entry (Timestamp orintforRangeIndex-based series) at which each component attains its minimum / maximum — mirroringpandas.DataFrame.idxmin/idxmax. The change is purely additive (no break in existing behavior).Other Information
Why a
pd.Seriesand not aTimeSeries?Each component can have a different argmin / argmax (this is exactly the multivariate dilemma raised in the issue thread). A single-row
TimeSeriescannot represent that — it would have to pick one component's argmax index for the whole row. Returning apd.Series(which is what users were already falling back to viaseries.pd_dataframe().idxmax()) sidesteps the problem cleanly and matches pandas semantics.Stochastic series. For probabilistic series we reduce samples with the median first, so the returned index does not depend on
n_samples.min/maxdocstrings. Added a one-line note pointing readers to the new helpers when they want the actual argmin/argmax timestamp; the existing behaviour (and signature) is unchanged.CHANGELOG. Updated under
Unreleased>For users of the library>Improved.Reproduce BEFORE/AFTER yourself (copy-paste)
The two commands are identical except for the checked-out ref.
What I ran locally
uv run pytest darts/tests/test_timeseries.py::TestSimpleStatistics -v→ 14 / 14 passed (10 pre-existing + 4 new).uv run pytest darts/tests/test_timeseries.py -k 'min or max'→ all green.uv run ruff check darts/timeseries.py darts/tests/test_timeseries.py→ all checks passed.test_idxmin_idxmax_univariate_datetimeagainstorigin/master(with the test cherry-picked but the implementation reverted) — fails withAttributeError: 'TimeSeries' object has no attribute 'idxmin'. The new tests therefore both fail onmasterand pass on this branch.Edge cases tested
argmin=3,argmax=2idxmin → 2020-01-04,idxmax → 2020-01-03test_idxmin_idxmax_univariate_datetimepd.Seriestest_idxmin_idxmax_multivariateRangeIndextime indexint, notTimestamptest_idxmin_idxmax_range_index2020-01-03regardless of sample noisetest_idxmin_idxmax_stochasticRisk / blast radius
Purely additive — two new methods on
TimeSeries, no change tomin/maxsemantics, no signature changes elsewhere. The twomin/maxdocstrings gain a.. note::block but their behavior and return shape are identical to before.PR drafted with assistance from Claude Code (Anthropic). Implementation, regression tests, and the BEFORE/AFTER reproducer above were reviewed manually against
unit8co/darts@masterand the discussion in #2696. The reproducer block was used during development; reviewers can paste it verbatim to verify.