Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What's the essential difference between ConvBert and LSRA? #14

Closed
yuanenming opened this issue Dec 10, 2020 · 3 comments
Closed

What's the essential difference between ConvBert and LSRA? #14

yuanenming opened this issue Dec 10, 2020 · 3 comments

Comments

@yuanenming
Copy link

LSRA: Lite Transformer with Long-Short Range Attention.

LSRA also integrates convolution operations into transformer blocks. I'm just wondering what makes ConvBert differ from LSRA.

@yuanenming
Copy link
Author

Is that LSRA combines multi-head attention and conv in a multi-branch manner, but ConvBert integrates conv into transformer blocks?
if the answer is yes. what are the pros and cons of the above two methods? Do you have experiments?

Thanks a lot!!!

@zihangJiang
Copy link
Collaborator

Hi @yuanenming , Thanks for your interest.

LSRA is for machine translation and abstractive summarization. They are combining dynamic conv and multi-head attention in a two-branch manner.

ConvBERT is a pre-training based model that can be fine-tuned on downstream tasks like sentence classification. We also propose a novel span-based dynamic convolution operator and combine it with the self-attention to form the mixed-attention block.

Experiments comparing span-based dynamic conv and dynamic conv can be found in Section 4.3 Table 2 in our paper.

You can find that our span-based dynamic conv is better than dynamic conv in this pre-training based model setting. But it's hard to directly compare LSRA with ConvBERT.

@yuanenming
Copy link
Author

Thank you for your timely reply!
I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants