Training time and number of GPUS #9

JialeTao · 2022-09-03T07:26:35Z

Hi, thanks for sharing the implementation. I wonder how many gpus (and what kind of gpu) you have used and how long to train the stage 1 and stage 2. Since I don't have much gpus, I want to see if I can afford the training or fine tuning.

thuanz123 · 2022-09-04T03:36:05Z

Hi @JialeTao, depend on the config, training can be fast or slow. For the config in this repo which is ViT-VGAN base, it takes about 1.45s per iteration with a batch size of 4 on a A100, this is quite demanding. So if you dont have much gpus, I recommend training a much smaller config than the config I have in this repo. Also, there are plans to train smaller models so you can wait if you want but it will be a long time later since I'm too busy these days 😭

JialeTao · 2022-09-04T06:45:25Z

Thanks for the reply. Then for the vit-vqgan base, what how many iterations you have trained? And the 1.45s means stage 1 training or stage 2 training? And the last, A100 with 40G menmory or 80G?

thuanz123 · 2022-09-07T01:56:18Z

Hi @JialeTao, 1,45s per iteration is for stage 1 training and the GPU is A100 40GB. I have trained vit-vqgan base for 1000000 iterations with 32 A100s and each gpu has batch size of 4. For stage 2 training, it is currently buggy so I dont have any estimate or numbers for it 😅

thuanz123 · 2022-09-09T08:20:18Z

Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen

zyf0619sjtu · 2022-09-09T08:43:41Z

Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen

Hi @thuanz123 , thanks for sharing the checkpoint of vit-vqgan small. Then for this small-small model, how many GPUs and how many iterations you have trained?

thuanz123 · 2022-09-09T09:05:20Z

Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen

Hi @thuanz123 , thanks for sharing the checkpoint of vit-vqgan small. Then for this small-small model, how many GPUs and how many iterations you have trained?

For quick training, I use 32 gpus A100 40GB and train for 500000 iterations on ImageNet. But I think a decent GPU with 8GB VRAM is enough, just lower the batch size and train longer

thuanz123 closed this as completed Sep 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Training time and number of GPUS #9

Training time and number of GPUS #9

JialeTao commented Sep 3, 2022

thuanz123 commented Sep 4, 2022

JialeTao commented Sep 4, 2022

thuanz123 commented Sep 7, 2022 •

edited

Loading

thuanz123 commented Sep 9, 2022

zyf0619sjtu commented Sep 9, 2022

thuanz123 commented Sep 9, 2022

Training time and number of GPUS #9

Training time and number of GPUS #9

Comments

JialeTao commented Sep 3, 2022

thuanz123 commented Sep 4, 2022

JialeTao commented Sep 4, 2022

thuanz123 commented Sep 7, 2022 • edited Loading

thuanz123 commented Sep 9, 2022

zyf0619sjtu commented Sep 9, 2022

thuanz123 commented Sep 9, 2022

thuanz123 commented Sep 7, 2022 •

edited

Loading