Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG:Wrong att-mask in decoder #38

Closed
slczgwh opened this issue Feb 26, 2021 · 2 comments
Closed

BUG:Wrong att-mask in decoder #38

slczgwh opened this issue Feb 26, 2021 · 2 comments

Comments

@slczgwh
Copy link

slczgwh commented Feb 26, 2021

A BUG when creating model:

AttentionLayer(FullAttention(True, factor, attention_dropout=dropout, output_attention=False),
d_model, n_heads),
AttentionLayer(FullAttention(False, factor, attention_dropout=dropout, output_attention=False),
d_model, n_heads),

The BUG lead the cross-att in decoder using NO casual mask, while self-att using casual mask. Fortunately there is no information leak in informer, but still totally different with what you wrote in the paper.

@slczgwh slczgwh changed the title Wrong att-mask in decoder BUG:Wrong att-mask in decoder Feb 26, 2021
@cookieminions
Copy link
Collaborator

Please refer to Figure 1. In decoder, we use masked multi-head prob-sparse self-attention and multi-head cross attention, so we use casual mask in self-attention and no casual mask in cross-attention. Since we utilize generative inference to generate prediction results, the causal mask in decoder is actually not critical. But you can also use mask in cross-attention ,and choose to use prob attn or full attn freely.

@zhouhaoyi
Copy link
Owner

Suppose there is no more discussion and I will close this issue in 12h.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants