Feat/lagged features names #1679

madtoinou · 2023-03-29T10:14:57Z

Summary

Added helper functions in the tabularization and regression model in order to generate labels for the lagged features (and static covariates if applicable), which are stored in the RegressionModel.lagged_features_name_ attribute (List[List[str]). This enable the usage of the feature_importances attribute, available for some sklearn models. It was not possible before because the create_lagged_data method returns arrays and lose the name of the columns.

If the model was fit on a single TimeSeries, the attribute contains only one List[str], if trained on a Sequence of TimeSeries, the attribute contains several List[str].

The Lists containing the lagged features names are always nested, the API could eventually be simplified:

if the components names are identical across all the TimeSeries used during training, retain the first one only
wrapping the access to this attribute in a function, asking the user to provide a series (and mapping the training TimeSeries to their lagged features names)

Additional information

Added the corresponding tests.

…dels

…ure_importances in the relevant regression models

…nts names, create generic name for the corresponding variate, updated the tests

madtoinou · 2023-04-05T14:12:10Z

Updated the PR:

if argument is a single TimeSeries, its components' name are used
if argument is a Sequence[TimeSeries] and all the ts have exactly identical components names, these names are used
if argument is a Sequence[TimeSeries] and any of the ts contains a component with a different name, generic names are generated ('target', 'past_cov', 'future_cov').

Each "variate" (target, past_covariates and future_covariates) is processed independently: it's possible to have a mixture of generic names and "original names".

codecov-commenter · 2023-04-05T14:54:59Z

Codecov Report

Patch coverage: 97.59% and project coverage change: -0.06 ⚠️

Comparison is base (ebb9eb6) 94.11% compared to head (8d2a03e) 94.05%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1679      +/-   ##
==========================================
- Coverage   94.11%   94.05%   -0.06%     
==========================================
  Files         125      125              
  Lines       11447    11491      +44     
==========================================
+ Hits        10773    10808      +35     
- Misses        674      683       +9

Impacted Files	Coverage Δ
darts/models/forecasting/baselines.py	`100.00% <ø> (ø)`
darts/models/forecasting/forecasting_model.py	`94.77% <87.50%> (+0.23%)`	⬆️
darts/utils/data/tabularization.py	`98.99% <95.45%> (-1.01%)`	⬇️
darts/models/forecasting/ensemble_model.py	`95.45% <100.00%> (+4.88%)`	⬆️
...ts/models/forecasting/regression_ensemble_model.py	`100.00% <100.00%> (ø)`
darts/models/forecasting/regression_model.py	`97.27% <100.00%> (+0.12%)`	⬆️
darts/models/forecasting/theta.py	`90.27% <100.00%> (+0.70%)`	⬆️
...arts/models/forecasting/torch_forecasting_model.py	`90.01% <100.00%> (-0.21%)`	⬇️

... and 8 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

…e explainability module

madtoinou · 2023-04-06T09:50:37Z

Updated to use the same naming conventions as the explainability module: {comp_name}_lag{lag_idx} for unique component names, comp{i}_{variate_type}_lag{lag_idx} for generic names.

…co/darts into feat/lagged_features_names

… model

dennisbader

Looks good, thanks a lot @madtoinou 🚀

Ready for release! 🥳

dennisbader · 2023-04-06T10:01:41Z

darts/utils/data/tabularization.py

@@ -527,6 +534,120 @@ def create_lagged_prediction_data(
    return X, times


+def create_lagged_components_names(


could we handle the static covariates in here as well?

dennisbader · 2023-04-06T12:16:26Z

darts/models/forecasting/regression_model.py

@@ -358,6 +361,32 @@ def _create_lagged_data(

        return training_samples, training_labels

+    def _create_lagged_components_name(


I think we can remove this method and put everything into the helper function create_lagged_component_names

dennisbader · 2023-04-06T12:16:47Z

darts/models/forecasting/regression_model.py

+        )
+
+        # adding the static covariates on the right of each features_cols_name
+        features_cols_name = self._add_static_covariates_name(


we can move this into the helper function create_lagged_component_names

dennisbader · 2023-04-06T12:17:22Z

darts/models/forecasting/regression_model.py

@@ -445,6 +474,41 @@ def _add_static_covariates(
            features = features[0]
        return features

+    def _add_static_covariates_name(


should be part of create_lagged_component_names in my opinion

dennisbader · 2023-04-06T12:19:01Z

darts/models/forecasting/regression_model.py

+    ) -> Union[np.array, Sequence[np.array]]:
+        """
+        Add static covariates names to the features name for RegressionModels.
+        Accounts for series with potentially different static covariates to accomodate for the maximum


Aren't the number of static covariates guaranteed to be identical? The models should throw an error when using series with different static covariate numbers, no?

dennisbader · 2023-04-06T12:23:42Z

darts/utils/data/tabularization.py

@@ -527,6 +534,120 @@ def create_lagged_prediction_data(
    return X, times


+def create_lagged_components_names(


Suggested change

def create_lagged_components_names(

def create_lagged_component_names(

dennisbader · 2023-04-06T12:25:34Z

darts/models/forecasting/regression_model.py

+            past_covariates=past_covariates,
+            future_covariates=future_covariates,
+        )
+        self.model.lagged_features_name_ = lagged_features_names


Let's use a naming convention lagged_feature_names similar to feature_importances_ from sklearn.

Also shouldn't we store this in the Darts model, rather than the actual one?
Would also require to define it in the model constructor

Suggested change

self.model.lagged_features_name_ = lagged_features_names

self.lagged_feature_names_ = lagged_feature_names

dennisbader · 2023-04-06T12:29:55Z

darts/utils/data/tabularization.py

+    target_series = (
+        [target_series] if not isinstance(target_series, Sequence) else target_series
+    )
+    past_covariates = (
+        [past_covariates]
+        if not isinstance(past_covariates, Sequence)
+        else past_covariates
+    )
+    future_covariates = (
+        [future_covariates]
+        if not isinstance(future_covariates, Sequence)
+        else future_covariates
+    )


Suggested change

target_series = (

[target_series] if not isinstance(target_series, Sequence) else target_series

)

past_covariates = (

[past_covariates]

if not isinstance(past_covariates, Sequence)

else past_covariates

)

future_covariates = (

[future_covariates]

if not isinstance(future_covariates, Sequence)

else future_covariates

)

target_series = series2seq(target_series)

past_covariates = series2seq(past_covariates)

future_covariates = series2seq(future_covariates)

dennisbader · 2023-04-06T12:33:44Z

darts/utils/data/tabularization.py

+        [lags, lags_past_covariates, lags_future_covariates],
+        ["target", "past_cov", "future_cov"],
+    ):
+        unique_components_names = set(


we can directly skip this iteration if variates is None (simplifies also later on)

Suggested change

unique_components_names = set(

if variate is None:

continue

unique_components_names = set(

* feat: create and store the lagged features names in the regression models * feat: adding corresponding tests in tabularization * fix: support any kind of Sequence to generate the lagged features name * feat: verify that the number of lagged feature names matches the feature_importances in the relevant regression models * fix: if any of the variate is a sequence of ts with different components names, create generic name for the corresponding variate, updated the tests * fix: using the same naming convention for the lagged components as the explainability module * refactor and fix some type hint warnings * simplified lagged feature name generation and moved out of regression model * fix regr model tests * fix create lagged data tests * fix small bug in unit test * fix bug in unittest from last PR --------- Co-authored-by: dennisbader <dennis.bader@gmx.ch>

madtoinou added 4 commits March 28, 2023 20:56

feat: create and store the lagged features names in the regression mo…

e7efa59

…dels

feat: adding corresponding tests in tabularization

cc77936

fix: support any kind of Sequence to generate the lagged features name

dbc942b

feat: verify that the number of lagged feature names matches the feat…

10060b7

…ure_importances in the relevant regression models

madtoinou requested a review from dennisbader as a code owner March 29, 2023 10:14

madtoinou and others added 2 commits April 5, 2023 16:02

fix: if any of the variate is a sequence of ts with different compone…

69a3819

…nts names, create generic name for the corresponding variate, updated the tests

Merge branch 'master' into feat/lagged_features_names

b798c50

fix: using the same naming convention for the lagged components as th…

f48ea8a

…e explainability module

dennisbader and others added 10 commits April 7, 2023 12:03

refactor and fix some type hint warnings

9fa93cb

Merge branch 'master' into feat/lagged_features_names

89dbb47

Merge branch 'master' into feat/lagged_features_names

47e4214

Merge branch 'master' into feat/lagged_features_names

b496881

Merge branch 'master' into feat/lagged_features_names

8d2a03e

Merge branch 'feat/lagged_features_names' of https://github.com/unit8…

1b624f5

…co/darts into feat/lagged_features_names

Merge branch 'master' into feat/lagged_features_names

76ee1c8

simplified lagged feature name generation and moved out of regression…

38c10a0

… model

fix regr model tests

1325983

fix create lagged data tests

dd3798b

dennisbader approved these changes Apr 11, 2023

View reviewed changes

dennisbader added 2 commits April 11, 2023 22:48

fix small bug in unit test

557dbfe

fix bug in unittest from last PR

05428e5

dennisbader merged commit 65861d1 into master Apr 11, 2023

dennisbader deleted the feat/lagged_features_names branch April 11, 2023 22:15

madtoinou mentioned this pull request Apr 12, 2023

Extracting corresponding coefficients for regression_model, where lags are used on covariates. #458

Closed

madtoinou mentioned this pull request Jun 14, 2023

Pass column names to LGBMModel to enable feature importance analysis #1309

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/lagged features names #1679

Feat/lagged features names #1679

madtoinou commented Mar 29, 2023

madtoinou commented Apr 5, 2023

codecov-commenter commented Apr 5, 2023 •

edited

Loading

madtoinou commented Apr 6, 2023

dennisbader left a comment

dennisbader Apr 6, 2023

dennisbader Apr 6, 2023

dennisbader Apr 6, 2023

dennisbader Apr 6, 2023

dennisbader Apr 6, 2023

dennisbader Apr 6, 2023

dennisbader Apr 6, 2023

dennisbader Apr 6, 2023

dennisbader Apr 6, 2023

		@@ -527,6 +534,120 @@ def create_lagged_prediction_data(
		return X, times


		def create_lagged_components_names(

		@@ -358,6 +361,32 @@ def _create_lagged_data(

		return training_samples, training_labels

		def _create_lagged_components_name(

	def create_lagged_components_names(
	def create_lagged_component_names(

	self.model.lagged_features_name_ = lagged_features_names
	self.lagged_feature_names_ = lagged_feature_names

Feat/lagged features names #1679

Feat/lagged features names #1679

Conversation

madtoinou commented Mar 29, 2023

Summary

Additional information

madtoinou commented Apr 5, 2023

codecov-commenter commented Apr 5, 2023 • edited Loading

Codecov Report

madtoinou commented Apr 6, 2023

dennisbader left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-commenter commented Apr 5, 2023 •

edited

Loading