For custom dataset #27

HeroadZ · 2021-02-19T10:16:05Z

Congratulations on the best paper award!
I'm new to the transformers with time-series data and just found this cool paper!
It will be great if you could provide an example to show how to use the custom data.
Thanks in advance!

zhouhaoyi · 2021-02-20T00:35:16Z

Thanks for your attention. We will release a colab to help people use our model on custom data. Please keep watching this repo.

wywzxxz · 2021-02-20T09:14:12Z

Actually there only 4 tensor that you need to figure the meaning of. And according to data/data_loader.py, I believe there shape are:
model.forward(x_enc, x_mark_enc, x_dec, x_mark_dec)

x_enc : batch_size ✖️ input_seq_len✖️ channel
x_dec : batch_size ✖️ output_seq_len✖️ channel
x_mark_enc: batch_size ✖️ input_seq_len✖️ 4. represent the timestamp of each input_seq element, and formated as 4-len tuple： (month,day,weekday,hour)
x_mark_dec:batch_size ✖️ output_seq_len✖️ 4.

Model arguments are more puzzling. Though I got some hints from main_informer.py.

Informer(
    enc_in ## encoder input size
   ,dec_in  ## decoder input size
   ,c_out   ## output size
   ,seq_len ## input series length
   ,label_len ## help series length
   ,pred_len ## predict series length
)

I not sure what label_len means. As far as I know, output_seq_len=label_len+pred_len. And in the training process zeroing x_dec[:,-pred_len:,:] before calling model.forward().

                dec_inp = torch.zeros_like(batch_y[:,-self.args.pred_len:,:]).double()

                dec_inp = torch.cat([batch_y[:,:self.args.label_len,:], dec_inp], dim=1).double().to(self.device)

However, I think this means dec_in=c_out, so I'm confused.

Anyway, since label_len=0 also works, It doesn't matter to much.

cookieminions · 2021-02-20T09:34:43Z

Yes, label_len means length of the start token series before the predict seires, so output_seq_len=label_len+pred_len.
The input sequence of Informer decoder is consist of a history series(label_len) before prediction and a series(pred_len) filled with zero. You can refer to the Figure 1.

HeroadZ · 2021-02-20T11:55:22Z

Thank you for the explanation. I'm not sure if I understand it correctly. Please check it.
Let's take the default settings as an example. The enc_input are 4-day(1,2,3,4) 7-channel features, the dec_input are 2-day(3,4) 7-channel features and 1-day(5) 7-channel 0s. The output are 1-day(5) 7-channel predictions. Is it right?

I have several questions. Sorry if they are stupid.

The target is OT, but the output is 7-channel predictions when features is setting as "M". Then in the loss calculation step, you compare all channel predictions with the truth, not only OT. Is it correct?
For the input x_mark, if I want to use my custom data, is it a must to set the same format like (month,day,weekday,hour)?
Is there any good suggestion on the selection of input length and decode length?

cookieminions · 2021-02-20T12:21:22Z

Understanding: Accurate!
Question1: Yes, our multivariate experiments use multivarite input to predict multivariate output, we will offer the option for multivariate input predicting univariate output later.
Question2: No, you just need to offer date information like '2020-01-01 00:00:00' in your data and name the column as 'date'. The code of time encoding is in line 65-73 of data/data_loader.py and utils/timefeatures.py. I suggest you use defaut embed augment 'TimeF' and the timeenc in dataset will be 1.
Question3: We recommend considering some meaningful time period related to specific data or tasks when choosing the input length, for example, choose one week as encoder input length and one day as decoder start token length.

zhouhaoyi closed this as completed Feb 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For custom dataset #27

For custom dataset #27

HeroadZ commented Feb 19, 2021

zhouhaoyi commented Feb 20, 2021

wywzxxz commented Feb 20, 2021 •

edited

cookieminions commented Feb 20, 2021

HeroadZ commented Feb 20, 2021

cookieminions commented Feb 20, 2021 •

edited

For custom dataset #27

For custom dataset #27

Comments

HeroadZ commented Feb 19, 2021

zhouhaoyi commented Feb 20, 2021

wywzxxz commented Feb 20, 2021 • edited

cookieminions commented Feb 20, 2021

HeroadZ commented Feb 20, 2021

cookieminions commented Feb 20, 2021 • edited

wywzxxz commented Feb 20, 2021 •

edited

cookieminions commented Feb 20, 2021 •

edited