Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/static covs #966

Merged
merged 29 commits into from Jun 5, 2022
Merged

Feat/static covs #966

merged 29 commits into from Jun 5, 2022

Conversation

dennisbader
Copy link
Collaborator

@dennisbader dennisbader commented May 20, 2022

Fixes #597.

Edit:

Summary

  • adds static covariate support for TimeSeries: static covariates are stored in the TimeSeries object itself as a pd.DataFrame
  • adds static covariates to torch datasets (additionally, unified the variable names between all datasets: training and inference datasets)
  • adds static covariate support for TFTModel
  • adds method TimeSeries.with_static_covariates() to add static covariates to any TimeSeries
  • adds method TimeSeries.from_group_dataframe(): group dataframes -> Basically multiple unit specific time series DataFrames that are vertically stacked. These can now be read and converted to multiple TimeSeries.
  • additional methods/properties: TimeSeries.static_covariates_values(), TimeSeries.static_covariates

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've only reviewed TimeSeries so far - will cover the rest later.

darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Show resolved Hide resolved
darts/timeseries.py Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks really great 💯💯. A few comments:

  • Let's keep TimeSeries immutable and check static covariates format (e.g. DataFrame of right shape and maybe right dtype upon construction of TimeSeries).
  • On dimensionality of static covariates: I think what you have now is good and we can leave it like that. Maybe add some good documentation on any method to access those, so that it's clear what the dimensions are.
  • You need add support for concatenation/stacking (hopefully quite easy) and maybe grep all parts of the code that calls values() or all_values() to make sure static covariates are propagated where needed.
  • I don't think we need to scale the static covariates - at least not for now. But let's mention this in the scaler docstring.
  • I think you can already pass on the static covariates tensors to all models and leave it up to the models (currently all of them but TFT) to ignore them.
  • Categorical support: as discussed, let's be flexible on the types allowed in static covariates and not limit ourselves to numeric types. We can then throw a runtime error if a string is used in e.g. a torch model, and leave it up to the models (or datasets) to do these checks and use the covariates as intended (as discussed, one of the first thing to add later on will be support for categorical features in LGBM).

darts/timeseries.py Outdated Show resolved Hide resolved
darts/utils/data/shifted_dataset.py Outdated Show resolved Hide resolved
darts/utils/data/horizon_based_dataset.py Show resolved Hide resolved
darts/utils/data/inference_dataset.py Outdated Show resolved Hide resolved
darts/tests/test_datasets.py Show resolved Hide resolved
darts/models/forecasting/tft_submodels.py Show resolved Hide resolved
darts/models/forecasting/pl_forecasting_module.py Outdated Show resolved Hide resolved
darts/models/forecasting/tft_model.py Outdated Show resolved Hide resolved
darts/tests/models/forecasting/test_TFT.py Show resolved Hide resolved
@dennisbader dennisbader marked this pull request as ready for review May 31, 2022 19:04
@dennisbader dennisbader requested a review from hrzn May 31, 2022 19:05
Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really great work @dennisbader - beautiful :)
And it's an excellent basis to start using static covariates in more places :)
I've had a few more very minor comments, but then from my perspective we can merge as soon as you feel confident you've covered all the cases (e.g. with the _xa accesses).

darts/dataprocessing/transformers/boxcox.py Outdated Show resolved Hide resolved
darts/dataprocessing/transformers/scaler.py Outdated Show resolved Hide resolved
darts/timeseries.py Show resolved Hide resolved
darts/timeseries.py Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
DataFrame. If a Series, the index represents the static variables. The covariates are globally 'applied'
to all components of the TimeSeries. If a DataFrame, the columns represent the static variables and the
rows represent the components of the uni/multivariate TimeSeries. If a single-row DataFrame, the covariates
are globally 'applied' to all components of the TimeSeries. If a multi-row DataFrame, the number of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you write "If a single-row DataFrame, the covariates are globally applied" --> That's a good idea. But is it implemented? In the constructor, I only see a check that the length of the DataFrame matches the number of components. Maybe we should "tile" the DataFrame if it has length 1?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the constructor it checks the following:

n_components > 1 and n_components != self.n_components,

so either the df has 1 or n_components components.

The component names are then set in the following lines:

static_covariates.index = (

When the number of static cov components is equal to n_components, it simply assign the component names to the static covs.
If the number is not equal n_components, it must be a single global component and it gets the following component name:

DEFAULT_GLOBAL_STATIC_COV_NAME = "global_components"

I don't think we need to tile the DataFrame as it doesn't add any new information. WDYT?

If we have a multivariate series with the global staic covariates and we select on component from the series, the static covariate component name will change from DEFAULT_GLOBAL_STATIC_COV_NAME to the name of the univariate component:

>>> sc = pd.Series([0., 1.], index=["sc1", "sc2"])
>>> ts = TimeSeries.from_values(values=np.random.rand(5, 2), columns=["a", "b"]).with_static_covariates(sc)
>>> print(ts.static_covariates)
>>> print("")
>>> print(ts["a"].static_covariates)

static_covariates  sc1  sc2
global_components  0.0  1.0

static_covariates  sc1  sc2
component                  
a                  0.0  1.0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I see. I had in mind to tile it so that whenever we access the covariates values (e.g. from training datasets), we always get a tensor with dimension matching the number of components.

darts/timeseries.py Show resolved Hide resolved
darts/timeseries.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented Jun 4, 2022

Codecov Report

Merging #966 (d001e17) into master (cad8055) will increase coverage by 0.52%.
The diff coverage is 96.26%.

@@            Coverage Diff             @@
##           master     #966      +/-   ##
==========================================
+ Coverage   92.39%   92.92%   +0.52%     
==========================================
  Files          76       76              
  Lines        7552     7632      +80     
==========================================
+ Hits         6978     7092     +114     
+ Misses        574      540      -34     
Impacted Files Coverage Δ
darts/dataprocessing/transformers/boxcox.py 100.00% <ø> (ø)
darts/dataprocessing/transformers/scaler.py 97.56% <ø> (ø)
darts/models/filtering/kalman_filter.py 98.76% <ø> (ø)
darts/models/forecasting/baselines.py 88.00% <ø> (ø)
darts/models/forecasting/forecasting_model.py 96.77% <ø> (ø)
darts/models/forecasting/varima.py 97.95% <ø> (ø)
darts/utils/timeseries_generation.py 95.86% <ø> (ø)
darts/utils/data/sequential_dataset.py 87.27% <70.00%> (ø)
darts/utils/data/shifted_dataset.py 84.78% <75.00%> (+0.16%) ⬆️
darts/utils/data/inference_dataset.py 94.59% <95.83%> (+0.04%) ⬆️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cad8055...d001e17. Read the comment docs.

Copy link
Contributor

@hrzn hrzn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

ts_new = abs(ts)
assert ts_new.static_covariates.equals(ts.static_covariates)
# arithmetics with series (left) and non-series (right)
self.helper_test_cov_transfer(ts, ts / 3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

@dennisbader dennisbader merged commit 6c90980 into master Jun 5, 2022
@dennisbader dennisbader deleted the feat/static_covs branch June 5, 2022 13:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for static covariates
3 participants