Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training time and number of GPUS #9

Closed
JialeTao opened this issue Sep 3, 2022 · 6 comments
Closed

Training time and number of GPUS #9

JialeTao opened this issue Sep 3, 2022 · 6 comments

Comments

@JialeTao
Copy link

JialeTao commented Sep 3, 2022

Hi, thanks for sharing the implementation. I wonder how many gpus (and what kind of gpu) you have used and how long to train the stage 1 and stage 2. Since I don't have much gpus, I want to see if I can afford the training or fine tuning.

@thuanz123
Copy link
Owner

Hi @JialeTao, depend on the config, training can be fast or slow. For the config in this repo which is ViT-VGAN base, it takes about 1.45s per iteration with a batch size of 4 on a A100, this is quite demanding. So if you dont have much gpus, I recommend training a much smaller config than the config I have in this repo. Also, there are plans to train smaller models so you can wait if you want but it will be a long time later since I'm too busy these days 😭

@JialeTao
Copy link
Author

JialeTao commented Sep 4, 2022

Thanks for the reply. Then for the vit-vqgan base, what how many iterations you have trained? And the 1.45s means stage 1 training or stage 2 training? And the last, A100 with 40G menmory or 80G?

@thuanz123
Copy link
Owner

thuanz123 commented Sep 7, 2022

Hi @JialeTao, 1,45s per iteration is for stage 1 training and the GPU is A100 40GB. I have trained vit-vqgan base for 1000000 iterations with 32 A100s and each gpu has batch size of 4. For stage 2 training, it is currently buggy so I dont have any estimate or numbers for it 😅

@thuanz123
Copy link
Owner

Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen

@zyf0619sjtu
Copy link

Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen

Hi @thuanz123 , thanks for sharing the checkpoint of vit-vqgan small. Then for this small-small model, how many GPUs and how many iterations you have trained?

@thuanz123
Copy link
Owner

Hi @JialeTao, the training vit-vqgan small is faster than I expected and it is just released. The speed is 1.05s per iteration for a batch size of 8 and it can even go up to 16 but 8 is good enough, again this is for A100 40GB. Also if you dont have any further question, I will close this issue. Feel free to reopen

Hi @thuanz123 , thanks for sharing the checkpoint of vit-vqgan small. Then for this small-small model, how many GPUs and how many iterations you have trained?

For quick training, I use 32 gpus A100 40GB and train for 500000 iterations on ImageNet. But I think a decent GPU with 8GB VRAM is enough, just lower the batch size and train longer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants