Add POINTER model #8454

dreasysnail · 2020-11-11T06:35:13Z

🌟 New model addition

Model description

POINTER is a progressive and non-autoregressive text generation pre-training approach, published on EMNLP 2020 by Microsoft Research. POINTER generates fluent text in a progressive and parallel manner. With empirical logarithmic time, POINTER outperforms existing non-autoregressive text generation approaches in hard-constrained text generation.

The model uses basically BERT-large architecture. However, an additional token is added to the vocab. The inference is performed by passing the input iteratively to the model. Since there is no existing model architecture in Huggingface that is compatible, I am not sure how to incorporate this into the model card.

Open source status

the model implementation is available: (https://github.com/dreasysnail/POINTER)
the model weights are available: here
who are the authors: @dreasysnail

dreasysnail · 2020-11-16T22:37:49Z

Thanks @patrickvonplaten for taking this. It's nice to work with you again :)

stefan-it · 2020-11-16T23:14:45Z

Really interesting approach 🤗

@dreasysnail Do you think it is possible to pre-train a model from scratch on one GPU in a reasonable time? Could you say something about your used hardware setup and training time for the pre-training phase 🤔

dreasysnail · 2020-11-17T19:30:26Z

Thanks @stefan-it ! Regarding your question:

@dreasysnail Do you think it is possible to pre-train a model from scratch on one GPU in a reasonable time? Could you say something about your used hardware setup and training time for the pre-training phase 🤔

The speed advantage of this algorithm is more on the decoding side. For the training time, you can expect this takes roughly similar amount of time comparing to, say, fine-tuning a BERT. One GPU is possible but if your dataset is large the training could be slow. So I would recommend you fine-tune from what we have already pretrained for fast convergence and better quality.

For your reference, we were using 8/16*V100 GPUs to pretrain and fine-tune the models. The pretraining takes roughly one week and the fine-tuning takes 1-2 days.

stale · 2021-01-17T21:34:43Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

dreasysnail added the New model label Nov 11, 2020

patrickvonplaten self-assigned this Nov 13, 2020

stale bot added the wontfix label Jan 17, 2021

patrickvonplaten removed the wontfix label Jan 18, 2021

huggingface deleted a comment from github-actions bot Mar 6, 2021

LysandreJik changed the title ~~Add a new model to Microsoft organization~~ Add POINTER model Sep 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add POINTER model #8454

Add POINTER model #8454

dreasysnail commented Nov 11, 2020

dreasysnail commented Nov 16, 2020

stefan-it commented Nov 16, 2020

dreasysnail commented Nov 17, 2020

stale bot commented Jan 17, 2021

Add POINTER model #8454

Add POINTER model #8454

Comments

dreasysnail commented Nov 11, 2020

🌟 New model addition

Model description

Open source status

dreasysnail commented Nov 16, 2020

stefan-it commented Nov 16, 2020

dreasysnail commented Nov 17, 2020

stale bot commented Jan 17, 2021