Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/comp lags feat order #2272

Merged
merged 27 commits into from Apr 12, 2024
Merged

Fix/comp lags feat order #2272

merged 27 commits into from Apr 12, 2024

Conversation

madtoinou
Copy link
Collaborator

Summary

Order of the extracted features are not consistent when lags were defined component-wise compared to the shared lags approach.

  • The lagged features are now reordered after extraction (sorted by lags, then by components) when defined component-wise to have a consistent order
  • Lagged features names created was updated accordingly

Other Information

  • Should fix the problem with non-passing tests on some platform, due to the different ordering of the features
  • Parametrized the test for create_lagged_component_names()
  • Added missing tests to ensure lags passed as list and dict give the same result when the lags are identical across lags
  • Added a test to make sure the extracted features are indeed in the desired order when the lags are passed as dict
  • Added test to check the order of lagged feature names

Copy link

@VascoSch92 VascoSch92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey 👋
sorry I saw just later that you were just moving the create_multivariate_linear_timeseries method in the other script.

I left some comment about this method.

Feel free to ignore if they don't have to do with the PR :-)

@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2024

Codecov Report

Attention: Patch coverage is 95.52239% with 3 lines in your changes are missing coverage. Please review.

Project coverage is 94.01%. Comparing base (e597998) to head (be8c706).

Files Patch % Lines
darts/utils/data/tabularization.py 95.31% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2272      +/-   ##
==========================================
- Coverage   94.02%   94.01%   -0.02%     
==========================================
  Files         138      138              
  Lines       14152    14177      +25     
==========================================
+ Hits        13307    13328      +21     
- Misses        845      849       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice @madtoinou, thanks for fixing this, and adding the missing tests 👍

I had some suggestions, mainly:

  • Add changelog entry
  • moving the auto regressive logic into tabularization, and handle the entire X creation in there
  • slight refactoring of the tests to avoid duplicate code

darts/utils/data/tabularization.py Outdated Show resolved Hide resolved
@@ -775,10 +784,31 @@ def create_lagged_component_names(

components = get_single_series(variate).components.tolist()
if isinstance(variate_lags, dict):
if "default_lags" in variate_lags:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have access here to all input TimeSeries, can't we just construct the full dict for the user?

Not sure if it's required, probably handled by the RegressionModel, I guess?

Copy link
Collaborator Author

@madtoinou madtoinou Apr 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the RegressionModel already handles it (_generate_lags()), I wanted to avoid code duplication (and circular imports) while preventing unexpected behavior if a user calls this method with parameters not directly extracted from a model.

darts/utils/data/tabularization.py Outdated Show resolved Hide resolved
@@ -940,18 +974,38 @@ def _create_lagged_data_by_moving_window(
# `t + lag_i` within this window is, therefore, `-1 + lag_i + min_lag_i`:
if isinstance(lags_i, list):
lags_to_extract = np.array(lags_i, dtype=int) + min_lag_i - 1
# Feats are already grouped by lags and ordered
reordered_feats = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of generating this array for every series, you can also make use of slicing.

For example in this case, the final lags_feats_order can be slice(None)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But in order to do so, would I need to make sure that none of the lags are ordered by components? Not sure to understand how I can use slice() here.

darts/models/forecasting/regression_model.py Outdated Show resolved Hide resolved
darts/models/forecasting/regression_model.py Outdated Show resolved Hide resolved
darts/models/forecasting/regression_model.py Outdated Show resolved Hide resolved
darts/models/forecasting/regression_model.py Outdated Show resolved Hide resolved
# Check that outputs match:
assert np.allclose(expected_X, X[:, :, 0])
assert np.allclose(expected_y, y[:, :, 0])
assert feats_times.equals(times[0])

# passing lags a dictionary
lags_as_dict = self.convert_lags_to_dict(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like in most of the tests, this code is duplicated from the non-dict lags case. We could add the logic as local function inside each test (including lagged feature extraction and checks), and then just that call it once with non-dict lags and once with dict lags.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, let me know what you think

Copy link
Collaborator

@dennisbader dennisbader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great now, thanks a lot for fixing this @madtoinou 🚀

@dennisbader dennisbader merged commit 78d39ad into master Apr 12, 2024
9 checks passed
@dennisbader dennisbader deleted the fix/comp_lags_feat_order branch April 12, 2024 08:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants