Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using existing NLP pre-trained encoders like BERT, RoBERTa #2

Open
BogdanDidenko opened this issue Nov 26, 2019 · 2 comments
Open

Using existing NLP pre-trained encoders like BERT, RoBERTa #2

BogdanDidenko opened this issue Nov 26, 2019 · 2 comments

Comments

@BogdanDidenko
Copy link

What do you think about combining your architecture with existing pre-trained encoders? Can BERT as an prior_encoder help achieve the better results?

@zomux
Copy link
Owner

zomux commented Nov 27, 2019

@BogdanDidenko Improving the prior is a promising approach. Here is a figure shows that the BLEU score goes up monotonically when improving the quality of the prior. (It shows the interpolation between p(z|x) and q(z|x,y) )

interpolation

I'm not sure whether BERT is able to do the job, but it is a promising thing to investigate. If it works in autoregressive models, it shall also work in non-autoregressive models somehow.

@BogdanDidenko
Copy link
Author

BogdanDidenko commented Nov 27, 2019

Yes, it's interesting research area. In my experience with BERT and autoregressive transformer decoder I achieve ~10% quality improvement in my seq2seq task(with RoBERTa this result even better). But I use some tricks and hard to say how it's will work with proposed approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants