Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training speed is very slow #312

Open
1179021477 opened this issue Feb 17, 2020 · 1 comment
Open

Training speed is very slow #312

1179021477 opened this issue Feb 17, 2020 · 1 comment

Comments

@1179021477
Copy link

@RogerChern I train tridentNet_1x with resnet50 on 4 GPU (a machine with 8 GPU), and I need 2 days. Especially, when others use other left GPUs in my machines, the speed of training my models is slower. Is there any way to make training faster? Like how to construct multi-thread, etc. My machine is TITAN X (Pascal).

@xchani
Copy link
Contributor

xchani commented Feb 23, 2020

Our dataloader does use multi-threading to load images.
According to your description, you are sharing gpu server with others, then jobs from others may occupy cpu resource in that server, which slow down your training. Also the speed of disk(IOPS) is another major factor should be considered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants