Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do you realized key_padding_mask in you long-term attention and short-term Attention? #3

Closed
zhouweii234 opened this issue Aug 27, 2021 · 3 comments

Comments

@zhouweii234
Copy link

Do you realized key_padding_mask (a parameter in torch.nn.multiheadattention) in the long-term attention and short-term Attention? Or you just do not need this in your network?

@zhouweii234 zhouweii234 changed the title Do you realized key_padding_mask in you long-term attention and short-term Attention Do you realized key_padding_mask in you long-term attention and short-term Attention? Aug 27, 2021
@z-x-yang
Copy link
Owner

No. The key_padding_mask is useless for long-term attention and short-term.

@zhouweii234
Copy link
Author

Thank you for your reply! By the way , do you use attn_mask?

@z-x-yang
Copy link
Owner

Long-term attention doesn't need any mask. The implementation of short-term attention is complicated. Since short-term attention has some processes related to the relative positions between pixels. Masking out the correlations out of feature boundary is necessary.

@z-x-yang z-x-yang closed this as completed Sep 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants