Using existing NLP pre-trained encoders like BERT, RoBERTa #2

BogdanDidenko · 2019-11-26T15:45:45Z

What do you think about combining your architecture with existing pre-trained encoders? Can BERT as an prior_encoder help achieve the better results?

zomux · 2019-11-27T00:25:11Z

@BogdanDidenko Improving the prior is a promising approach. Here is a figure shows that the BLEU score goes up monotonically when improving the quality of the prior. (It shows the interpolation between p(z|x) and q(z|x,y) )

I'm not sure whether BERT is able to do the job, but it is a promising thing to investigate. If it works in autoregressive models, it shall also work in non-autoregressive models somehow.

BogdanDidenko · 2019-11-27T11:12:39Z

Yes, it's interesting research area. In my experience with BERT and autoregressive transformer decoder I achieve ~10% quality improvement in my seq2seq task(with RoBERTa this result even better). But I use some tricks and hard to say how it's will work with proposed approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using existing NLP pre-trained encoders like BERT, RoBERTa #2

Using existing NLP pre-trained encoders like BERT, RoBERTa #2

BogdanDidenko commented Nov 26, 2019

zomux commented Nov 27, 2019 •

edited

Loading

BogdanDidenko commented Nov 27, 2019 •

edited

Loading

Using existing NLP pre-trained encoders like BERT, RoBERTa #2

Using existing NLP pre-trained encoders like BERT, RoBERTa #2

Comments

BogdanDidenko commented Nov 26, 2019

zomux commented Nov 27, 2019 • edited Loading

BogdanDidenko commented Nov 27, 2019 • edited Loading

zomux commented Nov 27, 2019 •

edited

Loading

BogdanDidenko commented Nov 27, 2019 •

edited

Loading