Train on GPU instead of TPU - differnt distribution strategies #16

PhilipMay · 2021-04-03T05:55:04Z

Hi,
many thanks for this nice new model type and your research.
We would like to train a ConvBERT but on GPU and not TPU.
Do you have any experiences or tips how to do this?
We have concerns regarding the differnt distribution strategies
between GPUs and TPUs.

Thanks
Philip

PhilipMay · 2021-04-03T06:03:22Z

Well - on the README you write:

The code is tested on a V100 GPU.

This means the pretraining on multiple GPUs - right?

zihangJiang · 2021-04-06T16:25:38Z

Hi, thanks for your interest.
Our code is only tested on a single V100 GPU. If you are seeking support for multi-GPU instead of TPU training, you may refer to https://huggingface.co/transformers/model_doc/convbert.html which implement our model in PyTorch.

PhilipMay changed the title ~~Train on GPU instead of TPU~~ Train on GPU instead of TPU - differnt distribution strategies Apr 3, 2021

zihangJiang closed this as completed Apr 6, 2021

This was referenced Apr 22, 2021

Training on multiple GPUs for BASE or LARGE Models #18

Closed

Training ConvBERT on multi GPU stefan-it/europeana-bert#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train on GPU instead of TPU - differnt distribution strategies #16

Train on GPU instead of TPU - differnt distribution strategies #16

PhilipMay commented Apr 3, 2021

PhilipMay commented Apr 3, 2021

zihangJiang commented Apr 6, 2021

Train on GPU instead of TPU - differnt distribution strategies #16

Train on GPU instead of TPU - differnt distribution strategies #16

Comments

PhilipMay commented Apr 3, 2021

PhilipMay commented Apr 3, 2021

zihangJiang commented Apr 6, 2021