Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training is slow and utilize only a few gpu resources #5

Closed
zhezh opened this issue Sep 4, 2017 · 9 comments
Closed

training is slow and utilize only a few gpu resources #5

zhezh opened this issue Sep 4, 2017 · 9 comments

Comments

@zhezh
Copy link

zhezh commented Sep 4, 2017

I tried train this model with data provided in readme, I find the training procedure is slow. It already has taken more than 2 hours to just iterate about 700 times.
I watch the usage of gpu and cpu, gpu usage is about 10%~20%, sometimes even low to 0%~3%, cpu usage is about 50%, sometimes drop to near 0.
CPU usage diagram is as below
screenshot from 2017-09-04 18-47-12

My laptop is with intel i7 6700k and gtx 1050ti. I run the code in pycharm with python 2.7 and tensorflow 1.3.

@yscacaca
Copy link
Owner

yscacaca commented Sep 4, 2017

I believe there is something not so efficient with the I/O part. You can try reading all training data into the memory if your machine allows. It should boost up the training speed.

@zhezh
Copy link
Author

zhezh commented Sep 6, 2017

Yes, you are right! I copy all the data from hdd to ssd partition, the training speed increased to more than twice.

I use glances to watch IO usage, the IO is about 4M, still low.

Maybe we can try use tfrecord to put all data into one file and then read it with streaming method to avoid reading a lot of little files if we lack RAM resources.

@zhezh
Copy link
Author

zhezh commented Sep 7, 2017

@yscacaca Hey, I write a version using tfrecord to boost training speed. If you are interested, see this link

@yscacaca
Copy link
Owner

yscacaca commented Sep 8, 2017

Cool, thank you @zhezh !

@yscacaca yscacaca closed this as completed Sep 8, 2017
@zhezh
Copy link
Author

zhezh commented Sep 19, 2017

@kaixinbuyu
Maybe you could debug it to see what is actually passed to this function.

@shamanez
Copy link

shamanez commented Oct 1, 2017

@kaixinbuyu I also have the same error can you please tell me how did you fix it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@yscacaca @shamanez @zhezh and others