About inputs to the decoder #223

puzzlecollector · 2021-09-05T17:19:34Z

@zhouhaoyi
Suppose I want to input a 28 length sequence, X_i,...,X_{i+27} and want to predict the 28 length sequence X_{i+28},...,X_{i+55}. Then the encoder will take in (X_i,...,X_{i+27}) as input and the decoder will take in (X_i,...,X_{i+27},0,...,0}. Is my understanding correct? Is this what you meant in the paper when you said you concat the start token and the zero placeholder for the target?

puzzlecollector · 2021-09-05T17:20:25Z

@zhouhaoyi
By (X_i,...,X_{i+27},0,...,0) I mean the 28 length sequence that was passed on to the encoder, and the 28 length zero padded target sequence that the decoder has to predict.

puzzlecollector · 2021-09-05T17:31:25Z

@zhouhaoyi Or does the encoder receive (X_i,...,X_{i+27},X_{i+28},...,X_{i+55}) and the decoder receives (X_i,...,X_{i+27},0,...,0) ?

cookieminions · 2021-09-06T02:21:15Z

@zhouhaoyi
Suppose I want to input a 28 length sequence, X_i,...,X_{i+27} and want to predict the 28 length sequence X_{i+28},...,X_{i+55}. Then the encoder will take in (X_i,...,X_{i+27}) as input and the decoder will take in (X_i,...,X_{i+27},0,...,0}. Is my understanding correct? Is this what you meant in the paper when you said you concat the start token and the zero placeholder for the target?

Hi, your understanding is correct, the input of Encoder is (X_i,...,X_{i+27}) and the input of Decoder can be (X_j,...,X_{i+27},0,...,0}, where i<=j<=i+27). (X_{i+28},...,X_{i+55}) is the groudtruth and we do not use it as input.

zhouhaoyi · 2021-09-06T08:45:17Z

Thanks @cookieminions

puzzlecollector · 2021-09-06T11:28:35Z

@zhouhaoyi @cookieminions

Thank you for the kind answers. I have another question - when looking at the InformerStack class in Informer2020/models/model.py what does the x_mark_enc and x_mark_dec represent in the forward function? What information do I have to pass for those two parameters?

class InformerStack(nn.Module):
    def __init__(self, enc_in, dec_in, c_out, seq_len, label_len, out_len, 
                factor=5, d_model=512, n_heads=8, e_layers=[3,2,1], d_layers=2, d_ff=512, 
                dropout=0.0, attn='prob', embed='fixed', freq='h', activation='gelu',
                output_attention = False, distil=True, mix=True,
                device=torch.device('cuda:0')):
        super(InformerStack, self).__init__()
        self.pred_len = out_len
        self.attn = attn
        self.output_attention = output_attention

        # Encoding
        self.enc_embedding = DataEmbedding(enc_in, d_model, embed, freq, dropout)
        self.dec_embedding = DataEmbedding(dec_in, d_model, embed, freq, dropout)
        # Attention
        Attn = ProbAttention if attn=='prob' else FullAttention
        # Encoder

        inp_lens = list(range(len(e_layers))) # [0,1,2,...] you can customize here
        encoders = [
            Encoder(
                [
                    EncoderLayer(
                        AttentionLayer(Attn(False, factor, attention_dropout=dropout, output_attention=output_attention), 
                                    d_model, n_heads, mix=False),
                        d_model,
                        d_ff,
                        dropout=dropout,
                        activation=activation
                    ) for l in range(el)
                ],
                [
                    ConvLayer(
                        d_model
                    ) for l in range(el-1)
                ] if distil else None,
                norm_layer=torch.nn.LayerNorm(d_model)
            ) for el in e_layers]
        self.encoder = EncoderStack(encoders, inp_lens)
        # Decoder
        self.decoder = Decoder(
            [
                DecoderLayer(
                    AttentionLayer(Attn(True, factor, attention_dropout=dropout, output_attention=False), 
                                d_model, n_heads, mix=mix),
                    AttentionLayer(FullAttention(False, factor, attention_dropout=dropout, output_attention=False), 
                                d_model, n_heads, mix=False),
                    d_model,
                    d_ff,
                    dropout=dropout,
                    activation=activation,
                )
                for l in range(d_layers)
            ],
            norm_layer=torch.nn.LayerNorm(d_model)
        )
        # self.end_conv1 = nn.Conv1d(in_channels=label_len+out_len, out_channels=out_len, kernel_size=1, bias=True)
        # self.end_conv2 = nn.Conv1d(in_channels=d_model, out_channels=c_out, kernel_size=1, bias=True)
        self.projection = nn.Linear(d_model, c_out, bias=True)
        
    def forward(self, x_enc, x_mark_enc, x_dec, x_mark_dec, 
                enc_self_mask=None, dec_self_mask=None, dec_enc_mask=None):
        enc_out = self.enc_embedding(x_enc, x_mark_enc)
        enc_out, attns = self.encoder(enc_out, attn_mask=enc_self_mask)

        dec_out = self.dec_embedding(x_dec, x_mark_dec)
        dec_out = self.decoder(dec_out, enc_out, x_mask=dec_self_mask, cross_mask=dec_enc_mask)
        dec_out = self.projection(dec_out)
        
        # dec_out = self.end_conv1(dec_out)
        # dec_out = self.end_conv2(dec_out.transpose(2,1)).transpose(1,2)
        if self.output_attention:
            return dec_out[:,-self.pred_len:,:], attns
        else:
            return dec_out[:,-self.pred_len:,:] # [B, L, D]

cookieminions · 2021-09-06T11:31:36Z

Hi, x_mark_enc and x_mark_dec are timestamps of x_enc and x_dec, enc_embedding and dec_embedding will use them to add time features on model inputs.

puzzlecollector · 2021-09-06T12:37:34Z

@cookieminions So I am guessing that this information corresponds to the section "Appendix B The Uniform Input
Representation" in the paper. If my data is daily time series (monday, tuesday, wednesday,...) then what should I pass over to the x_mark_enc and x_mark_dec arguments? Is it literally the timestamp data? So like "2016-01-01 Friday" , "2016-01-02 Saturday" ... for instance.

puzzlecollector · 2021-09-06T12:57:32Z

@cookieminions
I have taken a look at Informer2020/models/embed.py and I found this class

class TemporalEmbedding(nn.Module):
    def __init__(self, d_model, embed_type='fixed', freq='h'):
        super(TemporalEmbedding, self).__init__()

        minute_size = 4; hour_size = 24
        weekday_size = 7; day_size = 32; month_size = 13

        Embed = FixedEmbedding if embed_type=='fixed' else nn.Embedding
        if freq=='t':
            self.minute_embed = Embed(minute_size, d_model)
        self.hour_embed = Embed(hour_size, d_model)
        self.weekday_embed = Embed(weekday_size, d_model)
        self.day_embed = Embed(day_size, d_model)
        self.month_embed = Embed(month_size, d_model)
    
    def forward(self, x):
        x = x.long()
        
        minute_x = self.minute_embed(x[:,:,4]) if hasattr(self, 'minute_embed') else 0.
        hour_x = self.hour_embed(x[:,:,3])
        weekday_x = self.weekday_embed(x[:,:,2])
        day_x = self.day_embed(x[:,:,1])
        month_x = self.month_embed(x[:,:,0])
        
        return hour_x + weekday_x + day_x + month_x + minute_x

So my timestamp is in the format of year-month-day date. The timestamp ranges from 2016-01-01 to 2020-09-28. I guess I can modify the class so that I have

class TemporalEmbedding(nn.Module):
    def __init__(self, d_model, embed_type='fixed', freq='h'):
        super(TemporalEmbedding, self).__init__()

        minute_size = 4; hour_size = 24
        weekday_size = 7; day_size = 32; month_size = 13; year_size = 2022

        Embed = FixedEmbedding if embed_type=='fixed' else nn.Embedding
        if freq=='t':
            self.minute_embed = Embed(minute_size, d_model)
        self.hour_embed = Embed(hour_size, d_model)
        self.weekday_embed = Embed(weekday_size, d_model)
        self.day_embed = Embed(day_size, d_model)
        self.month_embed = Embed(month_size, d_model) 
        self.year_embed = Embed(year_size, d_model) 
    
    def forward(self, x):
        x = x.long()
        
        #minute_x = self.minute_embed(x[:,:,4]) if hasattr(self, 'minute_embed') else 0.
        #hour_x = self.hour_embed(x[:,:,3])
        weekday_x = self.weekday_embed(x[:,:,3])
        day_x = self.day_embed(x[:,:,2])
        month_x = self.month_embed(x[:,:,1])
        year_x = self.year_embed(x[:,:,0]) 
        return weekday_x + day_x + month_x + minute_x

where x[:,:,0] = 2016, x[:,:,1] = 1, x[:,:,2] = 1, x[:,:,3] = 5 for date 2016-01-01 Friday.

here I can encode sunday, monday, tuesday, ... to 0,1,2,...

and I will set freq = 'd' when I declare the model.

puzzlecollector · 2021-09-06T17:44:20Z

@cookieminions
I think I kind of figured out the timestamp problem. But now I have a slightly different issue. Suppose I want to predict the price of 21 agricultural goods for the next 28 days (given the past 28 days). So I declared the model as follows

from Informer2020.models.model import Informer, InformerStack

model = InformerStack(enc_in = 21, 
                      dec_in = 21, 
                      c_out = 21, 
                      seq_len = 28, 
                      label_len = 28, 
                      out_len = 56, 
                      freq = 'd') 
model.cuda()

Now, did I define the model parameters correctly? So the decoder takes in 21 sequences, where each sequence has starter length 28, and prediction length 28 (so we want to predict the next 28 days). And I assume out_len is 56 because it is the sum of the starter sequence + zero padded target tokens?

So if the batch size is 32, the dimensions of my inputs are as follows:

encoder_input: [32,28,21]
decoder_input: [32,56,21]
target: [32,28,21]
encoder_marks: [32,28,4] (represents the timestamps year, month, day, weekday)
decoder_marks: [32,28,4] (represents the timestamps year, month, day, weekday)

But when I do model.forward() I get the following error

I tried running a simple code like this

cnt = 0 
for batch_item in train_dataloader:
    encoder_input = batch_item['encoder_input'].to(device) 
    decoder_input = batch_item['decoder_input'].to(device) 
    target = batch_item['target'].to(device) 
    enc_marks = batch_item['encoder_marks'].to(device) 
    dec_marks = batch_item['decoder_marks'].to(device) 
    
    print(encoder_input.shape, decoder_input.shape, target.shape, enc_marks.shape, dec_marks.shape)
    
    pred = model(x_enc=encoder_input, x_mark_enc=enc_marks, x_dec=decoder_input, x_mark_dec=dec_marks)
    
    if cnt == 0: 
        break

cookieminions · 2021-09-07T11:37:05Z

Hi, out_len of decoder need to be set as 28, seq_len is the length of encoder input, label_len is the length of start token and out_len is the length of prediction series.
If you set label_len=28 and out_len=28, the decoder input dec_in will be [32, 56, 21], and you need to make dec_marks be [32, 56, 4], because dec_marks contains the timestamps of start token and prediction series.

puzzlecollector · 2021-09-08T16:55:43Z

@cookieminions @zhouhaoyi
Thank you so much for your kind replies. I was able to successfully train the model on my dataset and the result is much better than other baselines! (e.g. seq2seq+attention, ARIMA)

I have a question though. During inference, although I set the model to eval mode by model.eval() and also use forward with torch.no_grad(), the result of the predictions are slightly different all the time. Is this expected behavior? My inference code goes something like this

test_model.eval() 
### some code ### 
with torch.no_grad(): 
        outputs = test_model(x_enc=test_encoder_inputs,
                             x_mark_enc=test_encoder_marks, 
                             x_dec=test_decoder_inputs, 
                             x_mark_dec=test_decoder_marks) 
    
    outputs = outputs.cpu()

    outputs = outputs*torch_norm  # multiply output by some numbers to de-normalize

and the results for outputs is slightly different all the time, even though eval() mode is on and I am using torch.no_grad().

puzzlecollector · 2021-09-10T06:05:58Z

@zhouhaoyi

Oh I guess it was because of the probsparse attention. If I just use the full transformer (setting attn = 'full') I do not have the problem of having inconsistent outputs. By the way, even if the outputs are inconsistent, they are not supposed to deviate that much right?

zhouhaoyi · 2021-09-16T06:25:24Z

I think the inconsistency comes from the current implementation of Probsparse, in which the unselected attention may refer to the same leaf node rather than its original one. Please give a brief description of the following architecture, it may help us locate the problem. Thanks!

Zero-coder · 2022-05-23T08:50:02Z

nice discussion!

HasnainKhanNiazi · 2022-12-20T12:06:01Z

Hey guys @zhouhaoyi @cookieminions @puzzlecollector , such a nice discussion to follow. I am working on Informer for a multivariate problem where I am having 94 features and one output target. I have a question about the model inference/prediction. In the _process_one_batch method, we are passing the decoder input where firstly, it is being initiated by zeros and it is being concatenated with batch_y. I am removing if conditions as I am working with padding==0. I am training for MS but I am not passing the target variable in the input by changing the read_data method so that seq_x will be having only the features.

    # decoder input
    dec_inp = torch.zeros([batch_y.shape[0], self.args.pred_len, batch_y.shape[-1]]).float()
    dec_inp = torch.cat([batch_y[:,:self.args.label_len,:], dec_inp], dim=1).float().to(self.device)

My question is why are we concatenating batch_y values as in realtime, we will not be having batch_y values. I trained a model and the results look so promising with the above decoder input but if I don't use the batch_y concatenation part then the results aren't looking good.

zhouhaoyi closed this as completed Oct 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About inputs to the decoder #223

About inputs to the decoder #223

puzzlecollector commented Sep 5, 2021

puzzlecollector commented Sep 5, 2021

puzzlecollector commented Sep 5, 2021

cookieminions commented Sep 6, 2021

zhouhaoyi commented Sep 6, 2021

puzzlecollector commented Sep 6, 2021 •

edited

cookieminions commented Sep 6, 2021

puzzlecollector commented Sep 6, 2021 •

edited

puzzlecollector commented Sep 6, 2021 •

edited

puzzlecollector commented Sep 6, 2021 •

edited

cookieminions commented Sep 7, 2021

puzzlecollector commented Sep 8, 2021 •

edited

puzzlecollector commented Sep 10, 2021

zhouhaoyi commented Sep 16, 2021

Zero-coder commented May 23, 2022

HasnainKhanNiazi commented Dec 20, 2022 •

edited

About inputs to the decoder #223

About inputs to the decoder #223

Comments

puzzlecollector commented Sep 5, 2021

puzzlecollector commented Sep 5, 2021

puzzlecollector commented Sep 5, 2021

cookieminions commented Sep 6, 2021

zhouhaoyi commented Sep 6, 2021

puzzlecollector commented Sep 6, 2021 • edited

cookieminions commented Sep 6, 2021

puzzlecollector commented Sep 6, 2021 • edited

puzzlecollector commented Sep 6, 2021 • edited

puzzlecollector commented Sep 6, 2021 • edited

cookieminions commented Sep 7, 2021

puzzlecollector commented Sep 8, 2021 • edited

puzzlecollector commented Sep 10, 2021

zhouhaoyi commented Sep 16, 2021

Zero-coder commented May 23, 2022

HasnainKhanNiazi commented Dec 20, 2022 • edited

puzzlecollector commented Sep 6, 2021 •

edited

puzzlecollector commented Sep 6, 2021 •

edited

puzzlecollector commented Sep 6, 2021 •

edited

puzzlecollector commented Sep 6, 2021 •

edited

puzzlecollector commented Sep 8, 2021 •

edited

HasnainKhanNiazi commented Dec 20, 2022 •

edited