Rebase docker on tensorflow (can now use nvidia-docker for GPU version) #76

dbdean · 2017-02-23T12:50:36Z

Sorry for dropping a biggish pull request on you unannounced, but I haven't been able to get the GPU code working, and I noticed that a lot of the GPU (and CPU) Dockerfile seemed to be based on the tensorflow docker tools at https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/docker, so I thought it might be worth changing the base FROM image from gdal to tensorflow.

This has made the Dockerfiles much simpler (and the gpu one is now automatically generated in the make file as it is only four characters different from the cpu version).

I have also added a very simple set of python unittests that just test module importing at the moment, and added them to the Makefile and the Travis CI configuration. All test currently pass!

I can also confirm that the CPU training works everywhere I've tried it, and the GPU training works on AWS EC2 GPU instances.

I would like to update tensorflow and tflearn, but I've left it at 0.8 and the arbitrary git checkout for now. Updating those can be a problem for a different pull request.

Thanks!

Module load tests pass.

…king

…le cpu/gpu setup

andrewljohnson · 2017-02-23T17:14:54Z

Thanks for the PR, I'll try and build on my GPU box.

andrewljohnson · 2017-03-04T19:09:03Z

Currently building this on my Mac and Linux GPU box to confirm everything works.

andrewljohnson · 2017-03-04T19:22:39Z

@dbdean this fails for me on Linux/GPU because I don't have nvidia-docker. This throws command not found for nvidia-docker when running make dev-gpu.

Could you update this pull request to edit the README so there are instructions on how to do the full workflow on a GPU-enabled box?

andrewljohnson · 2017-03-04T19:24:30Z

Another good tweak for this PR would be to set up the tests to run in travis.yml

dbdean · 2017-03-06T01:37:22Z

@andrewljohnson, I have updated the README to provide instruction for installing nvidia-docker on Linux hosts.

WRT to the tests being in travis, that should already be setup in the .travis.yml file already in this PR.

dbdean · 2017-03-07T01:36:48Z

I've made some further changes to this PR, mostly about using the same docker run script instead of separate scripts for cpu and gpu usage. I've also made notebooks use the script too, and confirmed that I can access the notebooks over http.

andrewljohnson · 2017-03-07T20:23:13Z

Sorry to push back on this, but I am having trouble getting this to work, and I think others will too. It seems problematic to make getting up and running on this take so many more steps, especially if not explained clearly. 1. Can we make this be step-by-step set of instructions, that don't require reading docs from NVIDIA? 2. It's not clear to me as written whether I need to install nvidia-modprobe (until I got the error that needed to). And then we need to tell the user how to do this (i.e. have a step where we just tell the user to do sudo apt-get install nvidia-modprobe and when that needs to happen) 3. It's not clear to me how I install the NVIDIA drivers. I feel like the README for downloading and building this project should be distilled to a clear set of steps that someone can mindlessly follow. Maybe some people will have to reference external docs, but they shouldn't have to until they hit a snag in the steps included in the README. The instructions can be like "this is how you do it on Ubuntu, " and should state the sequence of commands to run in the terminal, along with any GUI steps explained. Andrew Johnson Founder *gaiagps.com <http://gaiagps.com>*

…

On Mon, Mar 6, 2017 at 5:36 PM, David Dean ***@***.***> wrote: I've made some further changes to this PR, mostly about using the same docker run script instead of separate scripts for cpu and gpu usage. I've also made notebooks use the script too, and confirmed that I can access the notebooks over http. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#76 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAbzfyv5OHMXn8uzCKVg8mfBf0rymctQks5rjLSwgaJpZM4MJ7S6> .

dbdean · 2017-03-08T02:10:22Z

@andrewljohnson, no problems. That's a perfectly reasonable request for this PR.

I kept it simple, as I was concerned about the instructions going stale as new NVIDIA drivers are released and/or nvidia-docker. However, the NVIDIA documentations is so poor, it is probably a good idea to at least provide a currently-working example.

I'll try and put something together over the next few days, outlining how I got it working in AWS GPU instances at least. Hopefully that should cover enough of the pitfalls that most people can get through it without too many changes.

dbdean · 2017-03-08T11:12:46Z

While I've updated the instructions above, it doesn't actually work yet as written, at least on ubuntu 16.04 EC2 instances. I've gotten it to work on 14.04 already, so I think I'll get it to work there again from scratch, and provide those instructions.

… EC2 instance

dbdean · 2017-03-09T03:50:14Z

Ok. I think my instructions were correct, but I hadn't made sure to download the latest NVIDIA driver because the NVIDIA download website is confusing. I haven't checked everything, yet, though. I let you know when it all works for me, and you can try @andrewljohnson.

dbdean · 2017-03-09T07:50:57Z

@andrewljohnson, I have run through those instructions on a fresh AWS EC2 16.04 GPU instance, and confirmed that everything works through to training of the neural network. Can you please have a go and let me know how it goes for you on your box. Thanks!

dbdean added 13 commits February 22, 2017 22:12

Make default Docker CMD /bin/bash

31ae574

Move docker cpu command to separate script

50afd41

Added module import tests, and added tests to travis

df72f17

Added some extra module import tests that I missed last time

9b0881b

CPU Dockerfile rebased on tensorflow

f365b53

Module load tests pass.

Add some extra Dockerfile stuff back in to get closer to actually wor…

b9a5e31

…king

Change CPU docker to build off common base image

9896e0d

build should depend upon build-cpu, not build_cpu

183040f

Move gpu build onto common image. GPU Running is unlikely to work yet.

7e590e4

Try to make docker-run-gpu use nvidia-docker. Doesn't work yet!

9d8d287

docker-run-gpu now works with nvidia-docker

036ad50

Removed docker/pip files now longer needed in the new common-Dockerfi…

bb57b91

…le cpu/gpu setup

Downgrade tensorflow and tflearn to known versions to work with DeepOSM

a4be447

dbdean added 2 commits March 6, 2017 11:28

Add instructions to install nvidia-docker on linux

fc44339

Add test documentation

a048ca9

dbdean added 4 commits March 7, 2017 01:32

Consolidate docker_run_cpu.sh and docker_run_gpu.sh into one script

51280ec

Remove unneeded flags from docker_run.sh

5b15b9e

Clean up target order in Makefile for dev-cpu, dev-gpu and dev

9a04490

make notebook now uses docker_run.sh and can now be invoked on GPU too

74fe35b

Improve nvidia-docker installation instructions

7e1e82f

Update nvidia-docker installation instructions. Now work on 16.04 AWS…

ea7c6ae

… EC2 instance

andrewljohnson merged commit 4b4f266 into trailbehind:master Mar 13, 2017

dbdean deleted the rebase_docker_on_tensorflow branch March 22, 2017 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rebase docker on tensorflow (can now use nvidia-docker for GPU version) #76

Rebase docker on tensorflow (can now use nvidia-docker for GPU version) #76

dbdean commented Feb 23, 2017

andrewljohnson commented Feb 23, 2017 via email

andrewljohnson commented Mar 4, 2017

andrewljohnson commented Mar 4, 2017

andrewljohnson commented Mar 4, 2017

dbdean commented Mar 6, 2017

dbdean commented Mar 7, 2017

andrewljohnson commented Mar 7, 2017 via email

dbdean commented Mar 8, 2017

dbdean commented Mar 8, 2017

dbdean commented Mar 9, 2017

dbdean commented Mar 9, 2017

Rebase docker on tensorflow (can now use nvidia-docker for GPU version) #76

Rebase docker on tensorflow (can now use nvidia-docker for GPU version) #76

Conversation

dbdean commented Feb 23, 2017

andrewljohnson commented Feb 23, 2017 via email

andrewljohnson commented Mar 4, 2017

andrewljohnson commented Mar 4, 2017

andrewljohnson commented Mar 4, 2017

dbdean commented Mar 6, 2017

dbdean commented Mar 7, 2017

andrewljohnson commented Mar 7, 2017 via email

dbdean commented Mar 8, 2017

dbdean commented Mar 8, 2017

dbdean commented Mar 9, 2017

dbdean commented Mar 9, 2017