Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rebase docker on tensorflow (can now use nvidia-docker for GPU version) #76

Merged
merged 21 commits into from
Mar 13, 2017

Conversation

dbdean
Copy link
Contributor

@dbdean dbdean commented Feb 23, 2017

Sorry for dropping a biggish pull request on you unannounced, but I haven't been able to get the GPU code working, and I noticed that a lot of the GPU (and CPU) Dockerfile seemed to be based on the tensorflow docker tools at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker, so I thought it might be worth changing the base FROM image from gdal to tensorflow.

This has made the Dockerfiles much simpler (and the gpu one is now automatically generated in the make file as it is only four characters different from the cpu version).

I have also added a very simple set of python unittests that just test module importing at the moment, and added them to the Makefile and the Travis CI configuration. All test currently pass!

I can also confirm that the CPU training works everywhere I've tried it, and the GPU training works on AWS EC2 GPU instances.

I would like to update tensorflow and tflearn, but I've left it at 0.8 and the arbitrary git checkout for now. Updating those can be a problem for a different pull request.

Thanks!

@andrewljohnson
Copy link
Contributor

andrewljohnson commented Feb 23, 2017 via email

@andrewljohnson
Copy link
Contributor

Currently building this on my Mac and Linux GPU box to confirm everything works.

@andrewljohnson
Copy link
Contributor

@dbdean this fails for me on Linux/GPU because I don't have nvidia-docker. This throws command not found for nvidia-docker when running make dev-gpu.

Could you update this pull request to edit the README so there are instructions on how to do the full workflow on a GPU-enabled box?

@andrewljohnson
Copy link
Contributor

Another good tweak for this PR would be to set up the tests to run in travis.yml

@dbdean
Copy link
Contributor Author

dbdean commented Mar 6, 2017

@andrewljohnson, I have updated the README to provide instruction for installing nvidia-docker on Linux hosts.

WRT to the tests being in travis, that should already be setup in the .travis.yml file already in this PR.

@dbdean
Copy link
Contributor Author

dbdean commented Mar 7, 2017

I've made some further changes to this PR, mostly about using the same docker run script instead of separate scripts for cpu and gpu usage. I've also made notebooks use the script too, and confirmed that I can access the notebooks over http.

@andrewljohnson
Copy link
Contributor

andrewljohnson commented Mar 7, 2017 via email

@dbdean
Copy link
Contributor Author

dbdean commented Mar 8, 2017

@andrewljohnson, no problems. That's a perfectly reasonable request for this PR.

I kept it simple, as I was concerned about the instructions going stale as new NVIDIA drivers are released and/or nvidia-docker. However, the NVIDIA documentations is so poor, it is probably a good idea to at least provide a currently-working example.

I'll try and put something together over the next few days, outlining how I got it working in AWS GPU instances at least. Hopefully that should cover enough of the pitfalls that most people can get through it without too many changes.

@dbdean
Copy link
Contributor Author

dbdean commented Mar 8, 2017

While I've updated the instructions above, it doesn't actually work yet as written, at least on ubuntu 16.04 EC2 instances. I've gotten it to work on 14.04 already, so I think I'll get it to work there again from scratch, and provide those instructions.

@dbdean
Copy link
Contributor Author

dbdean commented Mar 9, 2017

Ok. I think my instructions were correct, but I hadn't made sure to download the latest NVIDIA driver because the NVIDIA download website is confusing. I haven't checked everything, yet, though. I let you know when it all works for me, and you can try @andrewljohnson.

@dbdean
Copy link
Contributor Author

dbdean commented Mar 9, 2017

@andrewljohnson, I have run through those instructions on a fresh AWS EC2 16.04 GPU instance, and confirmed that everything works through to training of the neural network. Can you please have a go and let me know how it goes for you on your box. Thanks!

@andrewljohnson andrewljohnson merged commit 4b4f266 into trailbehind:master Mar 13, 2017
@dbdean dbdean deleted the rebase_docker_on_tensorflow branch March 22, 2017 11:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants