Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/window transformer #1269

Merged
merged 55 commits into from
Nov 26, 2022
Merged

Feat/window transformer #1269

merged 55 commits into from
Nov 26, 2022

Conversation

eliane-maalouf
Copy link
Contributor

@eliane-maalouf eliane-maalouf commented Oct 9, 2022

Fixes #1079. Draft for code review.

Summary

Implement window features generation as transformer that can be called as a standalone transformation, from pipeline or from forecasting model.

Other Information

Implementation started by @adamkells in #1203

@codecov-commenter
Copy link

codecov-commenter commented Oct 10, 2022

Codecov Report

Base: 93.94% // Head: 94.02% // Increases project coverage by +0.08% 🎉

Coverage data is based on head (faac844) compared to base (9b409a8).
Patch coverage: 97.19% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1269      +/-   ##
==========================================
+ Coverage   93.94%   94.02%   +0.08%     
==========================================
  Files          80       82       +2     
  Lines        8708     8893     +185     
==========================================
+ Hits         8181     8362     +181     
- Misses        527      531       +4     
Impacted Files Coverage Δ
darts/models/forecasting/arima.py 95.23% <ø> (ø)
darts/models/forecasting/auto_arima.py 100.00% <ø> (ø)
darts/models/forecasting/croston.py 92.50% <ø> (ø)
darts/models/forecasting/ensemble_model.py 91.66% <ø> (ø)
darts/models/forecasting/kalman_forecaster.py 100.00% <ø> (ø)
darts/models/forecasting/prophet_model.py 94.24% <ø> (ø)
darts/models/forecasting/regression_model.py 97.53% <ø> (ø)
darts/models/forecasting/sf_auto_arima.py 91.17% <ø> (ø)
darts/models/forecasting/sf_ets.py 100.00% <ø> (ø)
...arts/models/forecasting/torch_forecasting_model.py 87.68% <ø> (-0.08%) ⬇️
... and 15 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good start @eliane-maalouf. My main comments are mainly

  • I'm not sure I understand the behavior with multivariate series. I'd have expected window transformed applied on a multivariate series to return a new multivariate series with windowing applied component-wise.
  • I see you iterate over series (that's OK) and components when extracting the values. Would be nicer for performance reasons to extract values for all components at once.
  • How about moving the core of the windowing logic to a function TimeSeries.window_transform() ? It would allow users to simply get windowed series without instantiating a transformer, similar to TimeSeries.map(). Then this transformer could simply call TimeSeries.window_transform(). (see Mapper transformer). WDYT?

darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
Two optional keys can be provided for more flexibility: 'series_id' and 'comp_id'.
The 'series_id' key specifies the index of the series in the input sequence of series to which the
transformation should be applied.
The 'comp_id' key specifies the index of the component of the series to which the transformation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we try to support preferentially the case where components are provided as string names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that series_id and component could also be lists?

darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's some good foundations for the windowing @eliane-maalouf, thanks for that! I think the core of it, relying on pandas and separating the stochastic series over several columns is good.

There are still quite a few things to change, and behaviour that might need correction (I think I spotted a couple of bugs).

As an overall comment, I would really recommend that you try to simplify your code to the maximum possible. For instance you do many checks, many of which can be either simply skipped (delegated to another part of the code responsible for raising the exception) or drastically simplified. I've put quite a few examples of how that could look like in the comments. Each line of code is a small debt for our future selves who'll have to maintain it, so it's good to minimise it when we can. Small is often beautiful :)

Also in terms of structure, I think you could separate a bit more two steps in the TimeSeries function:

  • First do all the checks required (should hold in few lines of code only hopefully)
  • Then assume everything is checked and correct, and apply the logic.
    This way you can avoid inter-mingling some computation code with correctness- or type-checking code.

darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
@eliane-maalouf
Copy link
Contributor Author

@hrzn what do you think of this new version?

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great @eliane-maalouf ! I think it's nearly ready to be merged :) I've added a bunch of tiny comments, nothing important. I have also pushed 1 commit to make the transformer be listed in the documentation page.

Could you remove the draft notebook before merging? We can later transform it into a nice user-facing example notebook :) I've created a new task for that

darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/window_transformer.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
transformed_ts = self.series_multi_det.window_transform(
transforms=window_transformations, keep_non_transformed=True
)
self.assertEqual(len(transformed_ts.components), 6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nitpicking) it would be a tiny bit better to test the actual component names (incl the original ones)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand, I proposed something based on what I understood. Using sets to work with the column names was messing up the order in fact, good thing you asked me to check the component names.

@eliane-maalouf
Copy link
Contributor Author

This looks really great @eliane-maalouf ! I think it's nearly ready to be merged :) I've added a bunch of tiny comments, nothing important. I have also pushed 1 commit to make the transformer be listed in the documentation page.

Could you remove the draft notebook before merging? We can later transform it into a nice user-facing example notebook :) I've created a new task for that

Thanks @hrzn for the feedback. I removed the draft notebook and included your other comments.

@hrzn hrzn marked this pull request as ready for review November 26, 2022 07:42
@eliane-maalouf eliane-maalouf merged commit 436b7cf into master Nov 26, 2022
@jope35
Copy link

jope35 commented Nov 29, 2022

nice to see that this feature is in!
very good work all

@madtoinou madtoinou deleted the feat/window_transformer branch June 5, 2023 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

window transformer
5 participants