Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only 1 GPU is used for training #11

Closed
douyh opened this issue Apr 25, 2021 · 5 comments
Closed

Only 1 GPU is used for training #11

douyh opened this issue Apr 25, 2021 · 5 comments

Comments

@douyh
Copy link

douyh commented Apr 25, 2021

I noticed that only 1 GPU is used to train TransPose-R-A4 and lr=0.0001.
Should I change the lr if I want to use 4 or 8 gpus?
Or just keep the same?
Thanks for your reply.

@douyh
Copy link
Author

douyh commented Apr 25, 2021

Only 73.7AP I got.
4 gpus were used for training and I kept the other configs.

@yangsenius
Copy link
Owner

yangsenius commented Apr 26, 2021

From my experience, the performances of transpose-r models are very sensitive to the initial learning rate. I did not train transpose-r-a4 on 4 or 8 GPUs. I suggest you increase the initial learning rate a little bit at such conditions (with larger batchsize).

@yangsenius
Copy link
Owner

Please let me know the results if you have tried such experiments.

@EckoTan0804
Copy link

@yangsenius @douyh
FYI, I have trained TransPose-R-A4 on 4 GPUs. Initial and final learning rate were set to 5e-4 and 5e-5, respectively. Other configs kept unchanged.
I got 75.3 AP (+0.2 AP compared to README).

截屏2021-05-19 10 45 46

@yangsenius
Copy link
Owner

Thanks for sharing the results! Happy to see that can bring performance improvement @EckoTan0804

Larger batchsizes with more GPUs empirically bring performance improvement. The learning rate setting of DeiT -- 0.0005 ×batchsize/ constant may also work well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants