You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@BogdanDidenko Improving the prior is a promising approach. Here is a figure shows that the BLEU score goes up monotonically when improving the quality of the prior. (It shows the interpolation between p(z|x) and q(z|x,y) )
I'm not sure whether BERT is able to do the job, but it is a promising thing to investigate. If it works in autoregressive models, it shall also work in non-autoregressive models somehow.
Yes, it's interesting research area. In my experience with BERT and autoregressive transformer decoder I achieve ~10% quality improvement in my seq2seq task(with RoBERTa this result even better). But I use some tricks and hard to say how it's will work with proposed approach.
What do you think about combining your architecture with existing pre-trained encoders? Can BERT as an prior_encoder help achieve the better results?
The text was updated successfully, but these errors were encountered: