Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For custom dataset #27

Closed
HeroadZ opened this issue Feb 19, 2021 · 5 comments
Closed

For custom dataset #27

HeroadZ opened this issue Feb 19, 2021 · 5 comments

Comments

@HeroadZ
Copy link

HeroadZ commented Feb 19, 2021

Congratulations on the best paper award!
I'm new to the transformers with time-series data and just found this cool paper!
It will be great if you could provide an example to show how to use the custom data.
Thanks in advance!

@zhouhaoyi
Copy link
Owner

Thanks for your attention. We will release a colab to help people use our model on custom data. Please keep watching this repo.

@wywzxxz
Copy link

wywzxxz commented Feb 20, 2021

Actually there only 4 tensor that you need to figure the meaning of. And according to data/data_loader.py, I believe there shape are:
model.forward(x_enc, x_mark_enc, x_dec, x_mark_dec)

  1. x_enc : batch_size ✖️ input_seq_len✖️ channel
  2. x_dec : batch_size ✖️ output_seq_len✖️ channel
  3. x_mark_enc: batch_size ✖️ input_seq_len✖️ 4. represent the timestamp of each input_seq element, and formated as 4-len tuple: (month,day,weekday,hour)
  4. x_mark_dec:batch_size ✖️ output_seq_len✖️ 4.

Model arguments are more puzzling. Though I got some hints from main_informer.py.

Informer(
    enc_in ## encoder input size
   ,dec_in  ## decoder input size
   ,c_out   ## output size
   ,seq_len ## input series length
   ,label_len ## help series length
   ,pred_len ## predict series length
)

I not sure what label_len means. As far as I know, output_seq_len=label_len+pred_len. And in the training process zeroing x_dec[:,-pred_len:,:] before calling model.forward().

                dec_inp = torch.zeros_like(batch_y[:,-self.args.pred_len:,:]).double()

                dec_inp = torch.cat([batch_y[:,:self.args.label_len,:], dec_inp], dim=1).double().to(self.device)

However, I think this means dec_in=c_out, so I'm confused.

Anyway, since label_len=0 also works, It doesn't matter to much.

@cookieminions
Copy link
Collaborator

Yes, label_len means length of the start token series before the predict seires, so output_seq_len=label_len+pred_len.
The input sequence of Informer decoder is consist of a history series(label_len) before prediction and a series(pred_len) filled with zero. You can refer to the Figure 1.

@HeroadZ
Copy link
Author

HeroadZ commented Feb 20, 2021

Thank you for the explanation. I'm not sure if I understand it correctly. Please check it.
Let's take the default settings as an example. The enc_input are 4-day(1,2,3,4) 7-channel features, the dec_input are 2-day(3,4) 7-channel features and 1-day(5) 7-channel 0s. The output are 1-day(5) 7-channel predictions. Is it right?

I have several questions. Sorry if they are stupid.

  1. The target is OT, but the output is 7-channel predictions when features is setting as "M". Then in the loss calculation step, you compare all channel predictions with the truth, not only OT. Is it correct?
  2. For the input x_mark, if I want to use my custom data, is it a must to set the same format like (month,day,weekday,hour)?
  3. Is there any good suggestion on the selection of input length and decode length?

@cookieminions
Copy link
Collaborator

cookieminions commented Feb 20, 2021

Understanding: Accurate!
Question1: Yes, our multivariate experiments use multivarite input to predict multivariate output, we will offer the option for multivariate input predicting univariate output later.
Question2: No, you just need to offer date information like '2020-01-01 00:00:00' in your data and name the column as 'date'. The code of time encoding is in line 65-73 of data/data_loader.py and utils/timefeatures.py. I suggest you use defaut embed augment 'TimeF' and the timeenc in dataset will be 1.
Question3: We recommend considering some meaningful time period related to specific data or tasks when choosing the input length, for example, choose one week as encoder input length and one day as decoder start token length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants