-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add POINTER model #8454
Comments
Thanks @patrickvonplaten for taking this. It's nice to work with you again :) |
Really interesting approach 🤗 @dreasysnail Do you think it is possible to pre-train a model from scratch on one GPU in a reasonable time? Could you say something about your used hardware setup and training time for the pre-training phase 🤔 |
Thanks @stefan-it ! Regarding your question:
The speed advantage of this algorithm is more on the decoding side. For the training time, you can expect this takes roughly similar amount of time comparing to, say, fine-tuning a BERT. One GPU is possible but if your dataset is large the training could be slow. So I would recommend you fine-tune from what we have already pretrained for fast convergence and better quality. For your reference, we were using 8/16*V100 GPUs to pretrain and fine-tune the models. The pretraining takes roughly one week and the fine-tuning takes 1-2 days. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
🌟 New model addition
Model description
POINTER is a progressive and non-autoregressive text generation pre-training approach, published on EMNLP 2020 by Microsoft Research. POINTER generates fluent text in a progressive and parallel manner. With empirical logarithmic time, POINTER outperforms existing non-autoregressive text generation approaches in hard-constrained text generation.
The model uses basically BERT-large architecture. However, an additional token is added to the vocab. The inference is performed by passing the input iteratively to the model. Since there is no existing model architecture in Huggingface that is compatible, I am not sure how to incorporate this into the model card.
Open source status
The text was updated successfully, but these errors were encountered: