-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only 1 GPU is used for training #11
Comments
Only 73.7AP I got. |
From my experience, the performances of transpose-r models are very sensitive to the initial learning rate. I did not train transpose-r-a4 on 4 or 8 GPUs. I suggest you increase the initial learning rate a little bit at such conditions (with larger batchsize). |
Please let me know the results if you have tried such experiments. |
@yangsenius @douyh |
Thanks for sharing the results! Happy to see that can bring performance improvement @EckoTan0804 Larger batchsizes with more GPUs empirically bring performance improvement. The learning rate setting of DeiT -- |
I noticed that only 1 GPU is used to train TransPose-R-A4 and lr=0.0001.
Should I change the lr if I want to use 4 or 8 gpus?
Or just keep the same?
Thanks for your reply.
The text was updated successfully, but these errors were encountered: