Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training with common covariate for multiple timeseries #444

Closed
Sigvesor opened this issue Aug 20, 2021 · 12 comments
Closed

Training with common covariate for multiple timeseries #444

Sigvesor opened this issue Aug 20, 2021 · 12 comments

Comments

@Sigvesor
Copy link

Sigvesor commented Aug 20, 2021

Hello,

I want to train a global forecasting model for energy production from several generators.
The generators are all in roughly the same geographical area (loc1), so I will assume that they share climate feature (common_climate_covariate_loc1).
So I want to fit 25 generator TimeSeries to the model, and include a common covariate TimeSeries.

Will this be the right way to approach this problem?

model_cov.fit(series=[gen1_loc1, gen2_loc1, ... , genX_loc1],
past_covariates=[common_climate_covariate_loc1],
verbose=True)

where: gen1_loc1 is generator1 from location 1 and so on...

and later on, if I want to train the same model with 25 other generators that are from a different area (loc2) and have different climate data, do I just repeat the process and fit the new data to the same model?

model_cov.fit(series=[gen_loc2_1, gen_loc2_2, ... , gen_loc3_X],
past_covariates=[common_climate_covariate_loc2],
verbose=True)

where: gen1_loc2 is generator1 from location 2 and so on...

Will this be the correct procedure?
I hope I made the problem clear, thanks in advance for response.

@GregoryCrysler
Copy link

@Sigvesor - Having a very similar problem.

First things first - (1) what type of global model are you using (Transformer?); (2) are you receiving errors with the "past_covariates" argument in FIT? I keep receiving an unexpected argument error (wish I had the exact verbiage right now but no dice)...

@hrzn
Copy link
Contributor

hrzn commented Aug 20, 2021

Hello,

On principles, as far as I can tell what you're doing seems quite correct, but note three things:

  • Doing what you propose will raise an error, because past_covariates has to be a sequence of the same length as series. You should rather do something like this:
target_series = [gen1_loc1, gen2_loc1, ... , genX_loc1]

# repeat covariates X times, once per target
covariates = [common_climate_covariate_loc1 for _ in range(X)]

model_cov.fit(series=target_series, past_covariates=covariates, verbose=True)
  • For extending to different locations, you have two choices: (i) train one model per location (like you are suggesting), or (ii) train one model for both locations. Which case works best will depend on your data (and to what extent some knowledge on one location can help predict the other location), so you would have to try it out. For case (ii), the code would look as follows:
# assuming you have "X1" series for loc1 and "X2" series for loc2
target_series_loc1 = [gen1_loc1, gen2_loc1, ... , genX1_loc1]
target_series_loc2 = [gen1_loc2, gen2_loc2, ... , genX2_loc2]

covariates_loc1 = [common_climate_covariate_loc1 for _ in range(X1)]
covariates_loc2 = [common_climate_covariate_loc2 for _ in range(X2)]

model_cov.fit(series=target_series_loc1 + target_series_loc2, 
                        past_covariates=covariates_loc1 + covariates_loc2, 
                        verbose=True)

Note that in the above snippet X1 does not have to be equal to X2 (i.e. you can have a different number of series per location and still train a unique joint model).

@GregoryCrysler not all models accept past_covariates; for instance RNNModel accepts only future_covariates. I also recommend you to checkout the above article.

Hope this helps.

@Sigvesor
Copy link
Author

Sigvesor commented Aug 21, 2021

Thanks so much for the reply @hrzn,
Id like to make a few follow-up questions, if you dont mind?

  1. As I have mentioned I want to have a single model for this type of generator, but they can have different geo-locations, and hence different climate covariates. The generators are all rated at different P_max, but they usually share the same patterns of production. My thought was therefore to scale all the generators output between 0.0 and 1.0, and train the model on these scaled timeseries. Later, when you want to forecast on a specific generator, one can use the knowledge about the P_max and simply multiply the forecast with the P_max. Does that make sense, or am I missing something?

Simplified, but something like this:

`
P_max = 10 # example MW/h value
generator_ts = "the timeseries I want a forecast for"
scaler = Scaler() # since the model is trained on values between 0 and 1, I need to scale the generator_ts
generator_ts_scaled = scaler.fit_transform(generator_ts)
generator_forecast = model.predict(
n=48,
series=generator_ts_scaled,
past_covariates=past_cov_ts,
future_covariates=future_cov_ts,
)

generator_forecast_final = generator_forecast * P_max
`

  1. As you may have guessed by now, they are hydro generators. I am not quite sure that I truly understand the past vs future covariate situation fully. In the Forecasting a River Flow example in your article, you are using the melting rate as past_covariate and rainfall as future_covariate. How does it work to use a different covariate (rainfall) to predict the future, when the historical data was fitted with a different covariate (melting rate)? There is something elementary here I dont quite grasp.

The parallel I am thinking that I can draw to hydro generators, is then to use data on water level as a past covariate, and then use rainfall as a future covariate, to predict the hydro production. Is that reasonable?
How is this approach different then forecasting the water level from rainfall and using that forecast as future_covariate?

Thank you so much in advance for taking the time. Really appreciate it.

@hrzn
Copy link
Contributor

hrzn commented Aug 21, 2021

  1. Indeed it's a good idea to scale all series, however you shouldn't mix them:
  • Either you scale all your series between 0 and 1 (and then inverse-scale the forecasts using the same scaler, in order to get the final forecast),
  • Or you scale all your series by dividing by P_max (and then multiply by P_max to get the final forecasts).
    What you are proposing somehow mixes the two things (you transform by scaling in [0, 1] and then inverse-transform by multiplying by P_max, which is inconsistent).

"How does it work to use a different covariate (rainfall) to predict the future, when the historical data was fitted with a different covariate (melting rate)? There is something elementary here I dont quite grasp."

This does not happen. If the model is fit using melting only, then melting only has to be provided at prediction time. If it's fit with both melting and rainfall (like on the last example in the article), then both melting and rainfall have to be provided at prediction time.

"The parallel I am thinking that I can draw to hydro generators, is then to use data on water level as a past covariate, and then use rainfall as a future covariate, to predict the hydro production. Is that reasonable?"

That's very much a domain-specific question ;) I guess it's reasonable. It all depends what data is known in the future. If it's known in the future, you can use it as a future_covariates, otherwise you can't.

@GregoryCrysler
Copy link

@hrzn - to echo @Sigvesor , really grateful for your time on this issue.
So, I apologize if I'm missing something fundamental (please say if so), but I'm still struggling (DARTS 0.7.0).

**(1) The article (https://medium.com/unit8-machine-learning-publication/time-series-forecasting-using-past-and-future-external-**data-with-darts-1f0539585993) mentions that "Past covariates models: BlockRNNModel, NBEATSModel, TCNModel, TransformerModel
(2) I've defined a transformer model
"
In [32]: type(my_model)
Out [32]: darts.models.transformer_model.TransformerModel
"
(3) I've scaled 2 series and split them into train/validation sets, (all 4 sets defined)
"
In [35]: type(training_set_TOTAL_IMP_scl)
Out [35]: darts.timeseries.TimeSeries

In [36]: type(training_set_AUR_scl)
Out [36]: darts.timeseries.TimeSeries

In [37]: type(valid_set_TOTAL_IMP_scl)
Out [37]: darts.timeseries.TimeSeries

In [38]: type(valid_set_AUR_scl)
Out [38]: darts.timeseries.TimeSeries
"
(4) I've defined a covariant sequence for training:

In [44]: type(training_set_Avails_Cov)
Out [44]: list

(5) I'm still getting an error on the fit:

In [45]: my_model.fit(series=[training_set_TOTAL_IMP_scl, training_set_AUR_scl], val_series=[valid_set_TOTAL_IMP_scl, valid_set_AUR_scl], past_covariates= training_set_Avails_Cov, verbose=True)
Out:

TypeError Traceback (most recent call last)
in
----> 1 my_model.fit(series=[training_set_TOTAL_IMP_scl, training_set_AUR_scl], val_series=[valid_set_TOTAL_IMP_scl, valid_set_AUR_scl], past_covariates= training_set_Avails_Cov, verbose=True)

D:\Anaconda3\envs\conda_37-name\lib\site-packages\darts\utils\torch.py in decorator(self, *args, **kwargs)
63 with fork_rng():
64 manual_seed(self._random_instance.randint(0, high=MAX_TORCH_SEED_VALUE))
---> 65 decorated(self, *args, **kwargs)
66 return decorator

TypeError: fit() got an unexpected keyword argument 'past_covariates'

@hrzn
Copy link
Contributor

hrzn commented Aug 21, 2021

@GregoryCrysler please use Darts 0.10.1, the signature has changed in 0.10.0.

@Sigvesor
Copy link
Author

Sigvesor commented Aug 22, 2021

@hrzn

I am also having trouble with:
TypeError: fit() got an unexpected keyword argument 'past_covariates'

and also the predict() function, the same error.

I am using Darts 0.10.1

@hrzn
Copy link
Contributor

hrzn commented Aug 22, 2021

@hrzn

I am also having trouble with:
TypeError: fit() got an unexpected keyword argument 'past_covariates'

and also the predict() function, the same error.

I am using Darts 0.10.1

@Sigvesor what model are you using? Can you paste your code and the error you're getting?

@Sigvesor
Copy link
Author

@hrzn I am using the RNNModel,

ts_train is a list of TimeSeries

rnn_model.fit(ts_train, past_covariates=[wl_train for _ in range(len(ts_list))], verbose=True)
results in:

~\AppData\Roaming\Python\Python37\site-packages\darts\utils\torch.py in decorator(self, *args, **kwargs)

TypeError: fit() got an unexpected keyword argument 'past_covariates'

I can run the fit function if I dont write past_covariates, like this:
rnn_model.fit(ts_train, [wl_train for _ in range(len(ts_list))], verbose=True)

It also runs if you write:
rnn_model.fit(ts_train, covariates=[wl_train for _ in range(len(ts_list))], verbose=True)

but if you write past_covariates, or future_covariates, it doesnt recognize those.

The same is for the function model.historical_forecasts()
TypeError: historical_forecasts() got an unexpected keyword argument 'past_covariates'

~\AppData\Roaming\Python\Python37\site-packages\darts\utils\utils.py in sanitized_method(self, *args, **kwargs)
136 for sanity_check_method in sanity_check_methods:
137 # Convert all arguments into keyword arguments
--> 138 all_as_kwargs = getcallargs(method_to_sanitize, self, *args, **kwargs)
139
140 # Then separate args from kwargs according to the function's signature

~\Anaconda3\envs\darts-venv\lib\inspect.py in getcallargs(*func_and_positional, **named)
1356 if not varkw:
1357 raise TypeError("%s() got an unexpected keyword argument %r" %
-> 1358 (f_name, kw))
1359 arg2value[varkw][kw] = value
1360 continue

TypeError: historical_forecasts() got an unexpected keyword argument 'past_covariates'

Bug?

@hrzn
Copy link
Contributor

hrzn commented Aug 22, 2021

@hrzn I am using the RNNModel,

ts_train is a list of TimeSeries

rnn_model.fit(ts_train, past_covariates=[wl_train for _ in range(len(ts_list))], verbose=True)
results in:

~\AppData\Roaming\Python\Python37\site-packages\darts\utils\torch.py in decorator(self, *args, **kwargs)

TypeError: fit() got an unexpected keyword argument 'past_covariates'

I can run the fit function if I dont write past_covariates, like this:
rnn_model.fit(ts_train, [wl_train for _ in range(len(ts_list))], verbose=True)

It also runs if you write:
rnn_model.fit(ts_train, covariates=[wl_train for _ in range(len(ts_list))], verbose=True)

but if you write past_covariates, or future_covariates, it doesnt recognize those.

The same is for the function model.historical_forecasts()
TypeError: historical_forecasts() got an unexpected keyword argument 'past_covariates'

~\AppData\Roaming\Python\Python37\site-packages\darts\utils\utils.py in sanitized_method(self, *args, **kwargs)
136 for sanity_check_method in sanity_check_methods:
137 # Convert all arguments into keyword arguments
--> 138 all_as_kwargs = getcallargs(method_to_sanitize, self, *args, **kwargs)
139
140 # Then separate args from kwargs according to the function's signature

~\Anaconda3\envs\darts-venv\lib\inspect.py in getcallargs(*func_and_positional, **named)
1356 if not varkw:
1357 raise TypeError("%s() got an unexpected keyword argument %r" %
-> 1358 (f_name, kw))
1359 arg2value[varkw][kw] = value
1360 continue

TypeError: historical_forecasts() got an unexpected keyword argument 'past_covariates'

Bug?

As explained in the article, the RNNModel does not support past_covariates, only future_covariates.
You can also see this in the README in the table comparing the models.

@GregoryCrysler
Copy link

GregoryCrysler commented Aug 23, 2021

Update: @hrzn - I'd be grateful for any advice on the issue below. I've given some thought to the issue and I still can't figure out why there should be a problem.

@hrzn - thanks for the advice to update to 0.10.1. I know this has been a long thread so, again, grateful for your help. Last question and I hope I've made this easy to understand:

  1. Given a Transformer model , "my_model")

1. Given 3 TimeSeries, each having Train,Valid components (all are aligned along the same axis).
training_set_TOTAL_IMP_scl, valid_set_TOTAL_IMP_scl
training_set_AUR_scl, valid_set_AUR_scl
training_set_Avails_scl, valid_set_Avails_scl

2. The statement below runs just fine
my_model.fit(series=training_set_TOTAL_IMP_scl, val_series=valid_set_TOTAL_IMP_scl, verbose=True)

3. Again, the below runs just fine
my_model.fit(series=training_set_TOTAL_IMP_scl, verbose=True, past_covariates= training_set_Avails_scl)

4. Once again, just fine
my_model.fit(series=[training_set_TOTAL_IMP_scl, training_set_AUR_scl], verbose=True, past_covariates= [training_set_Avails_scl, training_set_Avails_scl])

5. But, extending (4) to include a validation series like below errors out with message "ValueError: The dimensions of the series in the training set and the validation set do not match."
my_model.fit(series=[training_set_TOTAL_IMP_scl, training_set_AUR_scl], verbose=True, past_covariates= [training_set_Avails_scl, training_set_Avails_scl], val_series=[valid_set_TOTAL_IMP_scl, valid_set_AUR_scl])

@gsamaras
Copy link
Contributor

@GregoryCrysler you need to pass covariates for the validation set as well, as per documentation. I had the same problem: #1140.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants