-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebase docker on tensorflow (can now use nvidia-docker for GPU version) #76
Rebase docker on tensorflow (can now use nvidia-docker for GPU version) #76
Conversation
Module load tests pass.
Thanks for the PR, I'll try and build on my GPU box.
|
Currently building this on my Mac and Linux GPU box to confirm everything works. |
@dbdean this fails for me on Linux/GPU because I don't have nvidia-docker. This throws command not found for nvidia-docker when running Could you update this pull request to edit the README so there are instructions on how to do the full workflow on a GPU-enabled box? |
Another good tweak for this PR would be to set up the tests to run in travis.yml |
@andrewljohnson, I have updated the README to provide instruction for installing nvidia-docker on Linux hosts. WRT to the tests being in travis, that should already be setup in the .travis.yml file already in this PR. |
I've made some further changes to this PR, mostly about using the same docker run script instead of separate scripts for cpu and gpu usage. I've also made notebooks use the script too, and confirmed that I can access the notebooks over http. |
Sorry to push back on this, but I am having trouble getting this to work,
and I think others will too. It seems problematic to make getting up and
running on this take so many more steps, especially if not explained
clearly.
1. Can we make this be step-by-step set of instructions, that don't
require reading docs from NVIDIA?
2. It's not clear to me as written whether I need to install
nvidia-modprobe (until I got the error that needed to). And then we need to
tell the user how to do this (i.e. have a step where we just tell the user
to do sudo apt-get install nvidia-modprobe and when that needs to happen)
3. It's not clear to me how I install the NVIDIA drivers.
I feel like the README for downloading and building this project should be
distilled to a clear set of steps that someone can mindlessly follow. Maybe
some people will have to reference external docs, but they shouldn't have
to until they hit a snag in the steps included in the README.
The instructions can be like "this is how you do it on Ubuntu, " and should
state the sequence of commands to run in the terminal, along with any GUI
steps explained.
Andrew Johnson
Founder
*gaiagps.com <http://gaiagps.com>*
…On Mon, Mar 6, 2017 at 5:36 PM, David Dean ***@***.***> wrote:
I've made some further changes to this PR, mostly about using the same
docker run script instead of separate scripts for cpu and gpu usage. I've
also made notebooks use the script too, and confirmed that I can access the
notebooks over http.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#76 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAbzfyv5OHMXn8uzCKVg8mfBf0rymctQks5rjLSwgaJpZM4MJ7S6>
.
|
@andrewljohnson, no problems. That's a perfectly reasonable request for this PR. I kept it simple, as I was concerned about the instructions going stale as new NVIDIA drivers are released and/or nvidia-docker. However, the NVIDIA documentations is so poor, it is probably a good idea to at least provide a currently-working example. I'll try and put something together over the next few days, outlining how I got it working in AWS GPU instances at least. Hopefully that should cover enough of the pitfalls that most people can get through it without too many changes. |
While I've updated the instructions above, it doesn't actually work yet as written, at least on ubuntu 16.04 EC2 instances. I've gotten it to work on 14.04 already, so I think I'll get it to work there again from scratch, and provide those instructions. |
Ok. I think my instructions were correct, but I hadn't made sure to download the latest NVIDIA driver because the NVIDIA download website is confusing. I haven't checked everything, yet, though. I let you know when it all works for me, and you can try @andrewljohnson. |
@andrewljohnson, I have run through those instructions on a fresh AWS EC2 16.04 GPU instance, and confirmed that everything works through to training of the neural network. Can you please have a go and let me know how it goes for you on your box. Thanks! |
Sorry for dropping a biggish pull request on you unannounced, but I haven't been able to get the GPU code working, and I noticed that a lot of the GPU (and CPU) Dockerfile seemed to be based on the tensorflow docker tools at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker, so I thought it might be worth changing the base FROM image from gdal to tensorflow.
This has made the Dockerfiles much simpler (and the gpu one is now automatically generated in the make file as it is only four characters different from the cpu version).
I have also added a very simple set of python unittests that just test module importing at the moment, and added them to the Makefile and the Travis CI configuration. All test currently pass!
I can also confirm that the CPU training works everywhere I've tried it, and the GPU training works on AWS EC2 GPU instances.
I would like to update tensorflow and tflearn, but I've left it at 0.8 and the arbitrary git checkout for now. Updating those can be a problem for a different pull request.
Thanks!