Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host on AWS, use multiple GPUs #8

Closed
andrewljohnson opened this issue Mar 24, 2016 · 7 comments
Closed

host on AWS, use multiple GPUs #8

andrewljohnson opened this issue Mar 24, 2016 · 7 comments
Assignees

Comments

@andrewljohnson
Copy link
Contributor

Here's a FOSS repo doing TensorFlow on AWS using Docker. This is a familiar stack.

Google Cloud also released their GPU offering recently.

@andrewljohnson
Copy link
Contributor Author

It would have been cool to use Google Cloud, but they don't seem to want to give any of us access.

@andrewljohnson andrewljohnson changed the title how to host for production host on AWS, use multiple GPUs May 10, 2016
@zain
Copy link
Contributor

zain commented May 13, 2016

Ok, here we go. I have a script that can be run on a fresh g2.2xlarge instance with Ubuntu 14.04 to bring it up to speed running Python 2.7, Tensorflow 0.8 w/ GPU support, using CUDA 7.5 and cuDNN v5-rc. Everything is installed using package managers and not compiled from source by hand, which is especially impressive if you consider it's using the latest versions of gdal and pylibosmium.

# spin up a g2.2xlarge with ubuntu 14.04
# before starting, scp the tarball for cudnn (cudnn-7.5-linux-x64-v5.0-rc.tgz) to /tmp

sudo add-apt-repository ppa:ubuntugis/ubuntugis-testing -y
sudo apt update

export LANGUAGE="en_US.UTF-8"
export LANG="en_US.UTF-8"
export LC_ALL="en_US.UTF-8"
locale-gen "en_US.UTF-8"
sudo dpkg-reconfigure locales

# blacklist nouveau gpu driver (in favor of CUDA)
echo -e "blacklist nouveau\nblacklist lbm-nouveau\noptions nouveau modeset=0\nalias nouveau off\nalias lbm-nouveau off\n" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf
echo options nouveau modeset=0 | sudo tee -a /etc/modprobe.d/nouveau-kms.conf
sudo update-initramfs -u

# apt prerequisites
sudo apt install -y build-essential git swig default-jdk zip zlib1g-dev libbz2-dev python2.7 python2.7-dev cmake python-pip mercurial libffi-dev libssl-dev libxml2-dev libxslt1-dev libpq-dev libmysqlclient-dev libcurl4-openssl-dev libjpeg-dev libpng12-dev gfortran libblas-dev liblapack-dev libatlas-dev libquadmath0 libfreetype6-dev pkg-config libshp-dev libsqlite3-dev libgd2-xpm-dev libexpat1-dev libgeos-dev libgeos++-dev libxml2-dev libsparsehash-dev libv8-dev libicu-dev libgdal1-dev libprotobuf-dev protobuf-compiler devscripts debhelper fakeroot doxygen libboost-dev libboost-all-dev gdal-bin linux-image-extra-virtual linux-source

# cuda
cd /tmp
wget http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1404/x86_64/cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1404_7.5-18_amd64.deb
sudo apt update
sudo apt install -y cuda
sudo apt install linux-headers-$(uname -r)
sudo reboot now  # <<<<<< reboot! 
sudo modprobe nvidia  # should return no errors

# cuDNN - assumes you already have the tarball in /tmp
cd /tmp
tar -xzf cudnn-7.5-linux-x64-v5.0-rc.tgz
sudo cp /tmp/cuda/lib64/* /usr/local/cuda/lib64
sudo cp /tmp/cuda/include/* /usr/local/cuda/include

# virtualenv
sudo pip install --upgrade pip
sudo pip install virtualenv
cd ~
virtualenv venv
source venv/bin/activate

# python prerequisites
pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow-0.8.0-cp27-none-linux_x86_64.whl
pip install gdal --global-option=build_ext --global-option="-I/usr/include/gdal/"

git clone --branch v2.6.1 https://github.com/osmcode/libosmium.git /tmp/libosmium
pip install --global-option=build_ext --global-option="-I/tmp/libosmium/include" git+https://github.com/osmcode/pyosmium@v2.6.0

At the end of all this, you can do the following and observe Tensorflow using the GPU:

$ export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
$ export CUDA_HOME=/usr/local/cuda
$ source venv/bin/activate
$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:105] successfully opened CUDA library libcurand.so locally
>>>

I created an AMI with the above script. You can spin up the AMI and run the following to clone and run DeepOSM:

# global vars that need to be set
export AWS_ACCESS_KEY_ID=***
export AWS_SECRET_ACCESS_KEY=***
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64"
export CUDA_HOME=/usr/local/cuda
source ~/venv/bin/activate

# make a /data and /data/cache directory on the SSD for DeepOSM to use
sudo mkdir -p /mnt/data/cache
sudo ln -s /mnt/data /data
sudo chmod -R 777 /mnt/data
export GEO_DATA_DIR=/data

# DeepOSM
git clone https://github.com/trailbehind/DeepOSM.git /tmp/DeepOSM
cd /tmp/DeepOSM
ln -s /tmp/DeepOSM/s3config-default /home/ubuntu/.s3cfg
pip install -r requirements_gpu.txt
export PYTHONPATH=`pwd`

# now you can run DeepOSM scripts!
python bin/create_training_data.py

@zain
Copy link
Contributor

zain commented May 13, 2016

Now, a couple of questions for you all:

  • do you think I should add docker on top of all this? I think we get most of the benefits of docker (reusable, disposable) from AMIs, without having to debug another layer when something goes wrong.
  • what's the next step here, so you can start using it? Want me to share the AMI?

@andrewljohnson
Copy link
Contributor Author

  1. If we don't use Docker, won't that mean we have to maintain multiple
    builds - one for AWS, one+ for local dev on Linux/Mac?

  2. Is next step to be able to log in and run this?

    python bin/create_training_data.py
    python bin/run_analysis.py

Then I could compare the performance and experience to my Linux box, start
getting fidelity on how our AWS lab might feel to a user.

@silberman likes Jupyter notebooks a lot too - I think he sees us providing
a hosted Jupyter notebook to tinker with, for us or others?

@zain
Copy link
Contributor

zain commented May 13, 2016

  1. I dunno. If we use docker, we'll need to maintain a build to install docker on ubuntu, and have to agree on some way to deploy containers. So I think we'll need a separate build for ubuntu no matter what -- either to install DeepOSM or to install docker. Correct me if you disagree here.
  2. Yup, I'm running those two commands right now and debugging any errors that arise. (edit: done! both those commands finished successfully)

@andrewljohnson
Copy link
Contributor Author

It seems like a good production solution could be:

  1. Overpass instance - Set up an Overpass server, use this instead of PBF munging #39
  2. RDS instance with NAIP data, plus separate Docker app to fill/edit the DB? - import NAIP data into a postgres database #23
  3. Tensorflow analyzer - split as own app, per split data creation and analysis into separate Docker apps #30
  4. an app for deeposm.org - web app for OSM review/edit - which I'm working on

Apps 1 & 2 provide an API to app 3, which publishes data to S3 for app 4 to imbibe into its own Django Postgres?

My guess is this production solution will start to be more of a requirement at scale... like it will be more convenient to do more than 1 state if we set up something like this, or provide more flexible analysis. We can go ahead and do deeposm.org/delaware, but then maybe have to get this done.

@andrewljohnson
Copy link
Contributor Author

  • I need to get my head around this code and get my work moved to AWS
  • merging with other infrastructure issues

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants