New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train a model with 16GB GPU #1
Comments
The default batch size |
Great, thank you! For (future) colab users, I am now using a batch size of 16 for 16GB RAM (P100) and 128x128. |
Hey,
Is there any chance to optimize this for one GPU? P.S.: The PyTorch requirements.txt do not include jax and jaxlib but you need them to run the code. I am not sure if they are just forgotten imports or if they are really needed for the code but this lead to errors for me. |
You may increase |
Thanks for the help. I increased it to 64, anything above that runs out of memory. It takes really long either way. I have 782 (50000//64+1) sampling rounds and each round takes about 35 minutes. So getting the 50kFID of one model takes about 19 days 😂 Do you have any experience with reducing the sample size and the corresponding FID accuracy? |
Yeah, that’s unfortunately due to the slow sampling of diffusion score models. Using the JAX version can be slightly better, since JAX code can sample faster than PyTorch. In my previous papers I also reported FID scores on 1k samples for some experiments, but in that case the FID score will be way larger than that evaluated on 50k samples. |
Thanks for the info👍 |
Hey,
thanks for your PyTorch implementation. I am trying to train a model with my custom dataset. I managed to set the dataset (tfrecords) up but I run out of memory on training loop step 0.
Sadly, I do not have more GPU RAM options. My config is the following:
Are there any options to improve memory efficiency? I would like to stay at a 128x128 resolution (if it is possible).
Thanks!
The text was updated successfully, but these errors were encountered: