Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add POINTER model #8454

Open
3 tasks done
dreasysnail opened this issue Nov 11, 2020 · 4 comments
Open
3 tasks done

Add POINTER model #8454

dreasysnail opened this issue Nov 11, 2020 · 4 comments
Assignees

Comments

@dreasysnail
Copy link
Contributor

🌟 New model addition

Model description

POINTER is a progressive and non-autoregressive text generation pre-training approach, published on EMNLP 2020 by Microsoft Research. POINTER generates fluent text in a progressive and parallel manner. With empirical logarithmic time, POINTER outperforms existing non-autoregressive text generation approaches in hard-constrained text generation.

The model uses basically BERT-large architecture. However, an additional token is added to the vocab. The inference is performed by passing the input iteratively to the model. Since there is no existing model architecture in Huggingface that is compatible, I am not sure how to incorporate this into the model card.

Open source status

@dreasysnail
Copy link
Contributor Author

Thanks @patrickvonplaten for taking this. It's nice to work with you again :)

@stefan-it
Copy link
Collaborator

Really interesting approach 🤗

@dreasysnail Do you think it is possible to pre-train a model from scratch on one GPU in a reasonable time? Could you say something about your used hardware setup and training time for the pre-training phase 🤔

@dreasysnail
Copy link
Contributor Author

Thanks @stefan-it ! Regarding your question:

@dreasysnail Do you think it is possible to pre-train a model from scratch on one GPU in a reasonable time? Could you say something about your used hardware setup and training time for the pre-training phase 🤔

The speed advantage of this algorithm is more on the decoding side. For the training time, you can expect this takes roughly similar amount of time comparing to, say, fine-tuning a BERT. One GPU is possible but if your dataset is large the training could be slow. So I would recommend you fine-tune from what we have already pretrained for fast convergence and better quality.

For your reference, we were using 8/16*V100 GPUs to pretrain and fine-tune the models. The pretraining takes roughly one week and the fine-tuning takes 1-2 days.

@stale
Copy link

stale bot commented Jan 17, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 17, 2021
@huggingface huggingface deleted a comment from github-actions bot Mar 6, 2021
@LysandreJik LysandreJik changed the title Add a new model to Microsoft organization Add POINTER model Sep 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants