BUG：Wrong att-mask in decoder #38

slczgwh · 2021-02-26T08:00:25Z

A BUG when creating model:

Informer2020/models/model.py

Lines 50 to 53 in a87092b

    
           AttentionLayer(FullAttention(True, factor, attention_dropout=dropout, output_attention=False),  
        
                       d_model, n_heads), 
        
           AttentionLayer(FullAttention(False, factor, attention_dropout=dropout, output_attention=False),  
        
                       d_model, n_heads),

The BUG lead the cross-att in decoder using NO casual mask, while self-att using casual mask. Fortunately there is no information leak in informer, but still totally different with what you wrote in the paper.

cookieminions · 2021-02-26T09:33:02Z

Please refer to Figure 1. In decoder, we use masked multi-head prob-sparse self-attention and multi-head cross attention, so we use casual mask in self-attention and no casual mask in cross-attention. Since we utilize generative inference to generate prediction results, the causal mask in decoder is actually not critical. But you can also use mask in cross-attention ,and choose to use prob attn or full attn freely.

zhouhaoyi · 2021-02-28T10:59:52Z

Suppose there is no more discussion and I will close this issue in 12h.

slczgwh changed the title ~~Wrong att-mask in decoder~~ BUG：Wrong att-mask in decoder Feb 26, 2021

zhouhaoyi closed this as completed Mar 2, 2021

zhouhaoyi mentioned this issue Mar 5, 2021

decoder without prob attention #43

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG：Wrong att-mask in decoder #38

BUG：Wrong att-mask in decoder #38

slczgwh commented Feb 26, 2021

cookieminions commented Feb 26, 2021

zhouhaoyi commented Feb 28, 2021

BUG：Wrong att-mask in decoder #38

BUG：Wrong att-mask in decoder #38

Comments

slczgwh commented Feb 26, 2021

cookieminions commented Feb 26, 2021

zhouhaoyi commented Feb 28, 2021