Do you realized key_padding_mask in you long-term attention and short-term Attention? #3

zhouweii234 · 2021-08-27T09:01:43Z

Do you realized key_padding_mask (a parameter in torch.nn.multiheadattention) in the long-term attention and short-term Attention? Or you just do not need this in your network?

The text was updated successfully, but these errors were encountered:

z-x-yang · 2021-08-27T10:33:29Z

No. The key_padding_mask is useless for long-term attention and short-term.

zhouweii234 · 2021-08-27T11:00:44Z

Thank you for your reply! By the way , do you use attn_mask?

z-x-yang · 2021-08-27T13:20:58Z

Long-term attention doesn't need any mask. The implementation of short-term attention is complicated. Since short-term attention has some processes related to the relative positions between pixels. Masking out the correlations out of feature boundary is necessary.

zhouweii234 changed the title ~~Do you realized key_padding_mask in you long-term attention and short-term Attention~~ Do you realized key_padding_mask in you long-term attention and short-term Attention? Aug 27, 2021

z-x-yang closed this as completed Sep 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do you realized key_padding_mask in you long-term attention and short-term Attention? #3

Do you realized key_padding_mask in you long-term attention and short-term Attention? #3

zhouweii234 commented Aug 27, 2021

z-x-yang commented Aug 27, 2021

zhouweii234 commented Aug 27, 2021

z-x-yang commented Aug 27, 2021

Do you realized key_padding_mask in you long-term attention and short-term Attention? #3

Do you realized key_padding_mask in you long-term attention and short-term Attention? #3

Comments

zhouweii234 commented Aug 27, 2021

z-x-yang commented Aug 27, 2021

zhouweii234 commented Aug 27, 2021

z-x-yang commented Aug 27, 2021