Skip to content

Conversation

@brunnedu
Copy link
Collaborator

Checklist before merging this PR:

  • Mentioned all issues that this PR fixes or addresses.
  • Summarized the updates of this PR under Summary.
  • Added an entry under Unreleased in the Changelog.

Fixes #2986.

Summary

This PR fixes a bug in ShiftedTorchTrainingDataset (and its subclasses SequentialTorchTrainingDataset and HorizonBasedTorchTrainingDataset) where the max_samples_per_ts parameter was not properly acting as an upper bound on the number of samples extracted per time series.

Example from issue:

series = linear_timeseries(length=1000)
dataset = ShiftedTorchTrainingDataset(
    series,
    input_chunk_length=11,
    output_chunk_length=13,
    max_samples_per_ts=5000,
)
# Before: len(dataset) == 5000 (incorrect)
# After: len(dataset) == 987 (correct: 1000 - (13+1) + 1)

Changes made:

  1. Fixed calculation logic in ShiftedTorchTrainingDataset.__init__() to cap max_samples_per_ts at the maximum extractable samples over all series.
  2. Added unit test (test_max_samples_per_ts_upper_bound) that verifies:
    • Behavior with max_samples_per_ts=None
    • Behavior when max_samples_per_ts > actual_max (the bug case)
    • Behavior when max_samples_per_ts < actual_max
    • Behavior with stride > 1
    • Behavior with multiple series of different lengths

Thanks to @daidahao for identifying and reporting this issue!

@brunnedu brunnedu requested a review from dennisbader as a code owner January 10, 2026 10:26
@brunnedu brunnedu changed the title fix max_samples_per_ts not acting as upper bound; add test; update ch… Fix/max_samples_per_ts Jan 10, 2026
@codecov
Copy link

codecov bot commented Jan 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.56%. Comparing base (72edd10) to head (4306d66).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2987      +/-   ##
==========================================
- Coverage   95.63%   95.56%   -0.07%     
==========================================
  Files         153      153              
  Lines       16433    16435       +2     
==========================================
- Hits        15715    15706       -9     
- Misses        718      729      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@daidahao
Copy link
Contributor

@brunnedu
Thank you for the quick fix!

@daidahao
Copy link
Contributor

@brunnedu Hi, do you know when could this be merged? I also have a TimesFM PR that would need a review.

@brunnedu
Copy link
Collaborator Author

Hi @daidahao, as @dennisbader, our codeowner, is currently away on leave, it will likely be another 2 weeks or so before we can merge these PRs. Regarding the TimesFM PR, I’d like @dennisbader to give it a final look once he’s back, as his familiarity with foundation models will be really valuable there. Thanks for your contributions!

Copy link
Collaborator

@jakubchlapek jakubchlapek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment on the changelog, but LGTM :) thanks

Co-authored-by: Jakub Chłapek <147340544+jakubchlapek@users.noreply.github.com>
Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful PR, thanks a lot @brunnedu 🚀

Just as a note to @daidahao: for multi-series of different lengths this will still upsample shorter series to have the same number of samples for each series (according to max samples of the longest series).

We can complete this PR once #2995 has been merged :)

@dennisbader dennisbader merged commit 807d22c into unit8co:master Jan 23, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] max_samples_per_ts exceeds the number of samples

5 participants