diff --git a/README.md b/README.md index bc4062c..a7993be 100644 --- a/README.md +++ b/README.md @@ -161,7 +161,7 @@ Here's what will be created: ``` $ vagrant ssh core-01 $ docker run --name sync-gateway -P couchbase/sync-gateway sync-gw-start -c feature/forestdb_bucket -g https://fixme.com -$ docker run --name elastic-thought -P --link sync-gateway:sync-gateway tleyden5iwx/elastic-thought-cpu-develop bash -c 'refresh-elastic-thought-refresher; refresh-elastic-thought; httpd' +$ docker run --name elastic-thought -P --link sync-gateway:sync-gateway tleyden5iwx/elastic-thought-cpu-develop bash -c 'refresh-elastic-thought; elastic-thought' ``` diff --git a/docker/cpu/develop/Dockerfile b/docker/cpu/develop/Dockerfile index 5598b58..b95c1a4 100644 --- a/docker/cpu/develop/Dockerfile +++ b/docker/cpu/develop/Dockerfile @@ -4,34 +4,30 @@ FROM tleyden5iwx/caffe-cpu-master MAINTAINER Traun Leyden tleyden@couchbase.com ENV GOPATH /opt/go -ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH ENV GOROOT /usr/local/go +ENV PATH $PATH:$GOPATH/bin:$GOROOT/bin +# Get dependencies RUN apt-get update && \ - apt-get -q -y install mercurial && \ - apt-get -q -y install make && \ - apt-get -q -y install binutils && \ - apt-get -q -y install bison && \ - apt-get -q -y install build-essential + apt-get -q -y install \ + mercurial \ + make \ + binutils \ + bison \ + build-essential RUN mkdir -p $GOPATH -# Install Go 1.3 manually (since Go 1.3 is required, and ubuntu 14.04 still uses Go 1.2) -RUN curl -O https://storage.googleapis.com/golang/go1.3.1.linux-amd64.tar.gz && \ - tar -C /usr/local -xzf go1.3.1.linux-amd64.tar.gz +# Download and install Go 1.4 +RUN wget http://golang.org/dl/go1.4.2.linux-amd64.tar.gz && \ + tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz && \ + rm go1.4.2.linux-amd64.tar.gz # Add refresh script ADD scripts/refresh-elastic-thought /usr/local/bin/ ADD scripts/refresh-elastic-thought-refresher /usr/local/bin/ # Go get ElasticThought -RUN go get -u -v -t github.com/tleyden/elastic-thought && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/httpd && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/worker && \ +RUN go get -u -v -t github.com/tleyden/elastic-thought/...&& \ cd $GOPATH/src/github.com/tleyden/elastic-thought && \ git log -3 - -# Copy binaries -RUN cp /opt/go/bin/worker /usr/local/bin && \ - cp /opt/go/bin/httpd /usr/local/bin - diff --git a/docker/cpu/develop/README.md b/docker/cpu/develop/README.md index dd8fd3d..a7993be 100644 --- a/docker/cpu/develop/README.md +++ b/docker/cpu/develop/README.md @@ -1,4 +1,4 @@ -[![Join the chat at https://gitter.im/tleyden/elastic-thought](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tleyden/elastic-thought?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) +[![Build Status](https://drone.io/github.com/tleyden/elastic-thought/status.png)](https://drone.io/github.com/tleyden/elastic-thought/latest) [![GoDoc](https://godoc.org/github.com/tleyden/elastic-thought?status.png)](https://godoc.org/github.com/tleyden/elastic-thought) [![Coverage Status](https://coveralls.io/repos/tleyden/elastic-thought/badge.svg?branch=master)](https://coveralls.io/r/tleyden/elastic-thought?branch=master) [![Join the chat at https://gitter.im/tleyden/elastic-thought](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tleyden/elastic-thought?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) Scalable REST API wrapper for the [Caffe](http://caffe.berkeleyvision.org) deep learning framework. @@ -36,15 +36,15 @@ If running on AWS, each [CoreOS](https://coreos.com/) instance would be running Although not shown, all components would be running inside of [Docker](https://www.docker.com/) containers. -[CoreOS Fleet](https://coreos.com/docs/launching-containers/launching/launching-containers-fleet/) would be leveraged to auto-restart any failed components, including Caffe workers. +It would be possible to start more nodes which only had Caffe GPU workers running. ## Roadmap *Current Status: everything under heavy construction, not ready for public consumption yet* 1. **[done]** Working end-to-end with IMAGE_DATA caffe layer using a single test set with a single training set, and ability to query trained set. -1. **[in progress]** ---> Support LEVELDB / LMDB data formats, to run mnist example. -1. Support the majority of caffe use cases +1. **[done]** Support LEVELDB / LMDB data formats, to run mnist example. +1. **[in progress]** Support the majority of caffe use cases 1. Package everything up to make it easy to deploy <-- initial release 1. Ability to auto-scale worker instances up and down based on how many jobs are in the message queue. 1. Attempt to add support for other deep learning frameworks: pylearn2, cuda-convnet, etc. @@ -63,17 +63,18 @@ Although not shown, all components would be running inside of [Docker](https://w * [REST API](http://docs.elasticthought.apiary.io/) * [Godocs](http://godoc.org/github.com/tleyden/elastic-thought) +* This README -## Grid Computing +## System Requirements -ElasticThought is not trying to be a grid computing (aka distributed computation) solution. +ElasticThought requires CoreOS to run. -For that, check out: +If you want to access the GPU, you will need to do extra work to get [CoreOS working with Nvidia CUDA GPU Drivers](http://tleyden.github.io/blog/2014/11/04/coreos-with-nvidia-cuda-gpu-drivers/) -* [ParameterServer](http://parameterserver.org/) -* [Caffe Issue 876](https://github.com/BVLC/caffe/issues/876) -## Kick things off: Aws +## Installing elastic-thought on AWS (Production mode) + +It should be possible to install elastic-thought anywhere that CoreOS is supported. Currently, there are instructions for AWS and Vagrant (below). ### Launch EC2 instances via CloudFormation script @@ -83,6 +84,16 @@ For that, check out: * Choose 3 node cluster with m3.medium or g2.2xlarge (GPU case) instance type * All other values should be default +### Verify CoreOS cluster + +Run: + +``` +$ fleetctl list-machines +``` + +Which should show all the CoreOS machines in your cluster. (this uses etcd under the hood, so will also validate that etcd is setup correctly). + ### Kick off ElasticThought Ssh into one of the machines (doesn't matter which): `ssh -A core@ec2-54-160-96-153.compute-1.amazonaws.com` @@ -99,30 +110,62 @@ It should look like this: ``` UNIT MACHINE ACTIVE SUB -cbfs_announce@1.service 2340c553.../10.225.17.229 active running -cbfs_announce@2.service fbd4562e.../10.182.197.145 active running -cbfs_announce@3.service 0f5e2e11.../10.168.212.210 active running -cbfs_node@1.service 2340c553.../10.225.17.229 active running -cbfs_node@2.service fbd4562e.../10.182.197.145 active running -cbfs_node@3.service 0f5e2e11.../10.168.212.210 active running -couchbase_bootstrap_node.service 0f5e2e11.../10.168.212.210 active running -couchbase_bootstrap_node_announce.service 0f5e2e11.../10.168.212.210 active running -couchbase_node.1.service 2340c553.../10.225.17.229 active running -couchbase_node.2.service fbd4562e.../10.182.197.145 active running -elastic_thought_gpu@1.service 2340c553.../10.225.17.229 active running -elastic_thought_gpu@2.service fbd4562e.../10.182.197.145 active running -elastic_thought_gpu@3.service 0f5e2e11.../10.168.212.210 active running -sync_gw_announce@1.service 2340c553.../10.225.17.229 active running -sync_gw_announce@2.service fbd4562e.../10.182.197.145 active running -sync_gw_announce@3.service 0f5e2e11.../10.168.212.210 active running -sync_gw_node@1.service 2340c553.../10.225.17.229 active running -sync_gw_node@2.service fbd4562e.../10.182.197.145 active running -sync_gw_node@3.service 0f5e2e11.../10.168.212.210 active running +cbfs_announce@1.service 2340c553.../10.225.17.229 active running +cbfs_announce@2.service fbd4562e.../10.182.197.145 active running +cbfs_announce@3.service 0f5e2e11.../10.168.212.210 active running +cbfs_node@1.service 2340c553.../10.225.17.229 active running +cbfs_node@2.service fbd4562e.../10.182.197.145 active running +cbfs_node@3.service 0f5e2e11.../10.168.212.210 active running +couchbase_bootstrap_node.service 0f5e2e11.../10.168.212.210 active running +couchbase_bootstrap_node_announce.service 0f5e2e11.../10.168.212.210 active running +couchbase_node.1.service 2340c553.../10.225.17.229 active running +couchbase_node.2.service fbd4562e.../10.182.197.145 active running +elastic_thought_gpu@1.service 2340c553.../10.225.17.229 active running +elastic_thought_gpu@2.service fbd4562e.../10.182.197.145 active running +elastic_thought_gpu@3.service 0f5e2e11.../10.168.212.210 active running +sync_gw_announce@1.service 2340c553.../10.225.17.229 active running +sync_gw_announce@2.service fbd4562e.../10.182.197.145 active running +sync_gw_announce@3.service 0f5e2e11.../10.168.212.210 active running +sync_gw_node@1.service 2340c553.../10.225.17.229 active running +sync_gw_node@2.service fbd4562e.../10.182.197.145 active running +sync_gw_node@3.service 0f5e2e11.../10.168.212.210 active running ``` At this point you should be able to access the [REST API](http://docs.elasticthought.apiary.io/) on the public ip any of the three Sync Gateway machines. -## Kick things off: Vagrant +## Installing elastic-thought on a single CoreOS host (Development mode) + +If you are on OSX, you'll first need to install Vagrant, VirtualBox, and CoreOS. See [CoreOS on Vagrant](https://coreos.com/docs/running-coreos/platforms/vagrant/) for instructions. + +Here's what will be created: + + + + ┌─────────────────────────────────────────────────────────┐ + │ CoreOS Host │ + │ ┌──────────────────────────┐ ┌─────────────────────┐ │ + │ │ Docker Container │ │ Docker Container │ │ + │ │ ┌───────────────────┐ │ │ ┌────────────┐ │ │ + │ │ │ Elastic Thought │ │ │ │Sync Gateway│ │ │ + │ │ │ Server │ │ │ │ Database │ │ │ + │ │ │ ┌───────────┐ │ │ │ │ │ │ │ + │ │ │ │In-process │ │◀─┼──┼───▶│ │ │ │ + │ │ │ │ Caffe │ │ │ │ │ │ │ │ + │ │ │ │ worker │ │ │ │ │ │ │ │ + │ │ │ └───────────┘ │ │ │ └────────────┘ │ │ + │ │ └───────────────────┘ │ └─────────────────────┘ │ + │ └──────────────────────────┘ │ + └─────────────────────────────────────────────────────────┘ + + +``` +$ vagrant ssh core-01 +$ docker run --name sync-gateway -P couchbase/sync-gateway sync-gw-start -c feature/forestdb_bucket -g https://fixme.com +$ docker run --name elastic-thought -P --link sync-gateway:sync-gateway tleyden5iwx/elastic-thought-cpu-develop bash -c 'refresh-elastic-thought; elastic-thought' +``` + + +## Installing elastic-thought on Vagrant ### Update Vagrant @@ -133,71 +176,103 @@ $ vagrant -v 1.7.1 ``` -### Install CoreOS +### Install CoreOS on Vagrant -See https://coreos.com/docs/running-coreos/platforms/vagrant/ +Clone the coreos/vagrant fork that has been customized for running ElasticThought. -### Update cloud-config +``` +$ cd ~/Vagrant +$ git clone git@github.com:tleyden/coreos-vagrant.git +$ cd coreos-vagrant +$ cp config.rb.sample config.rb +$ cp user-data.sample user-data +``` + +By default this will run a **two node** cluster, if you want to change this, update the `$num_instances` variable in the `config.rb` file. -Open the user-data file, and add: +### Run CoreOS ``` -write_files: - - path: /etc/systemd/system/docker.service.d/increase-ulimit.conf - owner: core:core - permissions: 0644 - content: | - [Service] - LimitMEMLOCK=infinity - - path: /var/lib/couchbase/data/.README - owner: core:core - permissions: 0644 - content: | - Couchbase Data files are stored here - - path: /var/lib/couchbase/index/.README - owner: core:core - permissions: 0644 - content: | - Couchbase Index files are stored here - - path: /var/lib/cbfs/data/.README - owner: core:core - permissions: 0644 - content: | - CBFS files are stored here +$ vagrant up ``` -### Increase RAM size of VM's +Ssh in: -Couchbase Server wants a lot of RAM. Bump up the vm memory size to 2GB. +``` +$ vagrant ssh core-01 -- -A +``` -Edit your Vagrantfile: +If you see: ``` -$vb_memory = 2048 +Failed Units: 1 + user-cloudinit@var-lib-coreos\x2dvagrant-vagrantfile\x2duser\x2ddata.service ``` -### Setup port forwarding for Couchbase UI (optional) +Jump to **Workaround CoreOS + Vagrant issues** below. -This is only needed if you want to be able to connect to the Couchbase web UI from a browser on your host OS (ie, OSX) +Verify things started up correctly: -Add the following snippet to your Vagrant file: +``` +core@core-01 ~ $ fleectctl list-machines +``` + +If you get errors like: ``` -if i == 1 - # create a port forward mapping to view couchbase web ui - config.vm.network "forwarded_port", guest: 8091, host: 5091 -end +2015/03/26 16:58:50 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused +2015/03/26 16:58:50 ERROR client.go:213: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms ``` -### Disable Transparent Huge Pages (optional) +Jump to **Workaround CoreOS + Vagrant issues** below. -Not sure how crucial this is, but I'll mention it just in case. After the CoreOS machines startup, ssh into each one: +### Workaround CoreOS + Vagrant issues: + +First exit out of CoreOS: ``` -$ sudo bash -# echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag +core@core-01 ~ $ exit ``` +On your OSX workstation, try the following workaround: + +``` +$ sed -i '' 's/420/0644/' user-data +$ sed -i '' 's/484/0744/' user-data +$ vagrant reload --provision +``` + +Ssh back in: + +``` +$ vagrant ssh core-01 -- -A +``` + +Verify it worked: + +``` +core@core-01 ~ $ fleectctl list-machines +``` + +You should see: + +``` +MACHINE IP METADATA +ce0fec18... 172.17.8.102 - +d6402b24... 172.17.8.101 - +``` + +I filed [CoreOS cloudinit issue 328](https://github.com/coreos/coreos-cloudinit/issues/328) to figure out why this error is happening (possibly related issues: [CoreOS cloudinit issue 261](https://github.com/coreos/coreos-cloudinit/issues/261) or [CoreOS cloudinit issue 190](https://github.com/coreos/bugs/issues/190)) + + +### Continue steps above + +Scroll up to the **Installing elastic-thought on AWS** section and start with **Verify CoreOS cluster** + +## FAQ + +* Is this useful for grid computing / distributed computation? **Ans**: No, this is not trying to be a grid computing (aka distributed computation) solution. You may want to check out [Caffe Issue 876](https://github.com/BVLC/caffe/issues/876) or [ParameterServer](http://parameterserver.org/) + ## License Apache 2 diff --git a/docker/cpu/master/Dockerfile b/docker/cpu/master/Dockerfile index 5598b58..b95c1a4 100644 --- a/docker/cpu/master/Dockerfile +++ b/docker/cpu/master/Dockerfile @@ -4,34 +4,30 @@ FROM tleyden5iwx/caffe-cpu-master MAINTAINER Traun Leyden tleyden@couchbase.com ENV GOPATH /opt/go -ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH ENV GOROOT /usr/local/go +ENV PATH $PATH:$GOPATH/bin:$GOROOT/bin +# Get dependencies RUN apt-get update && \ - apt-get -q -y install mercurial && \ - apt-get -q -y install make && \ - apt-get -q -y install binutils && \ - apt-get -q -y install bison && \ - apt-get -q -y install build-essential + apt-get -q -y install \ + mercurial \ + make \ + binutils \ + bison \ + build-essential RUN mkdir -p $GOPATH -# Install Go 1.3 manually (since Go 1.3 is required, and ubuntu 14.04 still uses Go 1.2) -RUN curl -O https://storage.googleapis.com/golang/go1.3.1.linux-amd64.tar.gz && \ - tar -C /usr/local -xzf go1.3.1.linux-amd64.tar.gz +# Download and install Go 1.4 +RUN wget http://golang.org/dl/go1.4.2.linux-amd64.tar.gz && \ + tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz && \ + rm go1.4.2.linux-amd64.tar.gz # Add refresh script ADD scripts/refresh-elastic-thought /usr/local/bin/ ADD scripts/refresh-elastic-thought-refresher /usr/local/bin/ # Go get ElasticThought -RUN go get -u -v -t github.com/tleyden/elastic-thought && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/httpd && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/worker && \ +RUN go get -u -v -t github.com/tleyden/elastic-thought/...&& \ cd $GOPATH/src/github.com/tleyden/elastic-thought && \ git log -3 - -# Copy binaries -RUN cp /opt/go/bin/worker /usr/local/bin && \ - cp /opt/go/bin/httpd /usr/local/bin - diff --git a/docker/cpu/master/README.md b/docker/cpu/master/README.md index dd8fd3d..a7993be 100644 --- a/docker/cpu/master/README.md +++ b/docker/cpu/master/README.md @@ -1,4 +1,4 @@ -[![Join the chat at https://gitter.im/tleyden/elastic-thought](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tleyden/elastic-thought?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) +[![Build Status](https://drone.io/github.com/tleyden/elastic-thought/status.png)](https://drone.io/github.com/tleyden/elastic-thought/latest) [![GoDoc](https://godoc.org/github.com/tleyden/elastic-thought?status.png)](https://godoc.org/github.com/tleyden/elastic-thought) [![Coverage Status](https://coveralls.io/repos/tleyden/elastic-thought/badge.svg?branch=master)](https://coveralls.io/r/tleyden/elastic-thought?branch=master) [![Join the chat at https://gitter.im/tleyden/elastic-thought](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tleyden/elastic-thought?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) Scalable REST API wrapper for the [Caffe](http://caffe.berkeleyvision.org) deep learning framework. @@ -36,15 +36,15 @@ If running on AWS, each [CoreOS](https://coreos.com/) instance would be running Although not shown, all components would be running inside of [Docker](https://www.docker.com/) containers. -[CoreOS Fleet](https://coreos.com/docs/launching-containers/launching/launching-containers-fleet/) would be leveraged to auto-restart any failed components, including Caffe workers. +It would be possible to start more nodes which only had Caffe GPU workers running. ## Roadmap *Current Status: everything under heavy construction, not ready for public consumption yet* 1. **[done]** Working end-to-end with IMAGE_DATA caffe layer using a single test set with a single training set, and ability to query trained set. -1. **[in progress]** ---> Support LEVELDB / LMDB data formats, to run mnist example. -1. Support the majority of caffe use cases +1. **[done]** Support LEVELDB / LMDB data formats, to run mnist example. +1. **[in progress]** Support the majority of caffe use cases 1. Package everything up to make it easy to deploy <-- initial release 1. Ability to auto-scale worker instances up and down based on how many jobs are in the message queue. 1. Attempt to add support for other deep learning frameworks: pylearn2, cuda-convnet, etc. @@ -63,17 +63,18 @@ Although not shown, all components would be running inside of [Docker](https://w * [REST API](http://docs.elasticthought.apiary.io/) * [Godocs](http://godoc.org/github.com/tleyden/elastic-thought) +* This README -## Grid Computing +## System Requirements -ElasticThought is not trying to be a grid computing (aka distributed computation) solution. +ElasticThought requires CoreOS to run. -For that, check out: +If you want to access the GPU, you will need to do extra work to get [CoreOS working with Nvidia CUDA GPU Drivers](http://tleyden.github.io/blog/2014/11/04/coreos-with-nvidia-cuda-gpu-drivers/) -* [ParameterServer](http://parameterserver.org/) -* [Caffe Issue 876](https://github.com/BVLC/caffe/issues/876) -## Kick things off: Aws +## Installing elastic-thought on AWS (Production mode) + +It should be possible to install elastic-thought anywhere that CoreOS is supported. Currently, there are instructions for AWS and Vagrant (below). ### Launch EC2 instances via CloudFormation script @@ -83,6 +84,16 @@ For that, check out: * Choose 3 node cluster with m3.medium or g2.2xlarge (GPU case) instance type * All other values should be default +### Verify CoreOS cluster + +Run: + +``` +$ fleetctl list-machines +``` + +Which should show all the CoreOS machines in your cluster. (this uses etcd under the hood, so will also validate that etcd is setup correctly). + ### Kick off ElasticThought Ssh into one of the machines (doesn't matter which): `ssh -A core@ec2-54-160-96-153.compute-1.amazonaws.com` @@ -99,30 +110,62 @@ It should look like this: ``` UNIT MACHINE ACTIVE SUB -cbfs_announce@1.service 2340c553.../10.225.17.229 active running -cbfs_announce@2.service fbd4562e.../10.182.197.145 active running -cbfs_announce@3.service 0f5e2e11.../10.168.212.210 active running -cbfs_node@1.service 2340c553.../10.225.17.229 active running -cbfs_node@2.service fbd4562e.../10.182.197.145 active running -cbfs_node@3.service 0f5e2e11.../10.168.212.210 active running -couchbase_bootstrap_node.service 0f5e2e11.../10.168.212.210 active running -couchbase_bootstrap_node_announce.service 0f5e2e11.../10.168.212.210 active running -couchbase_node.1.service 2340c553.../10.225.17.229 active running -couchbase_node.2.service fbd4562e.../10.182.197.145 active running -elastic_thought_gpu@1.service 2340c553.../10.225.17.229 active running -elastic_thought_gpu@2.service fbd4562e.../10.182.197.145 active running -elastic_thought_gpu@3.service 0f5e2e11.../10.168.212.210 active running -sync_gw_announce@1.service 2340c553.../10.225.17.229 active running -sync_gw_announce@2.service fbd4562e.../10.182.197.145 active running -sync_gw_announce@3.service 0f5e2e11.../10.168.212.210 active running -sync_gw_node@1.service 2340c553.../10.225.17.229 active running -sync_gw_node@2.service fbd4562e.../10.182.197.145 active running -sync_gw_node@3.service 0f5e2e11.../10.168.212.210 active running +cbfs_announce@1.service 2340c553.../10.225.17.229 active running +cbfs_announce@2.service fbd4562e.../10.182.197.145 active running +cbfs_announce@3.service 0f5e2e11.../10.168.212.210 active running +cbfs_node@1.service 2340c553.../10.225.17.229 active running +cbfs_node@2.service fbd4562e.../10.182.197.145 active running +cbfs_node@3.service 0f5e2e11.../10.168.212.210 active running +couchbase_bootstrap_node.service 0f5e2e11.../10.168.212.210 active running +couchbase_bootstrap_node_announce.service 0f5e2e11.../10.168.212.210 active running +couchbase_node.1.service 2340c553.../10.225.17.229 active running +couchbase_node.2.service fbd4562e.../10.182.197.145 active running +elastic_thought_gpu@1.service 2340c553.../10.225.17.229 active running +elastic_thought_gpu@2.service fbd4562e.../10.182.197.145 active running +elastic_thought_gpu@3.service 0f5e2e11.../10.168.212.210 active running +sync_gw_announce@1.service 2340c553.../10.225.17.229 active running +sync_gw_announce@2.service fbd4562e.../10.182.197.145 active running +sync_gw_announce@3.service 0f5e2e11.../10.168.212.210 active running +sync_gw_node@1.service 2340c553.../10.225.17.229 active running +sync_gw_node@2.service fbd4562e.../10.182.197.145 active running +sync_gw_node@3.service 0f5e2e11.../10.168.212.210 active running ``` At this point you should be able to access the [REST API](http://docs.elasticthought.apiary.io/) on the public ip any of the three Sync Gateway machines. -## Kick things off: Vagrant +## Installing elastic-thought on a single CoreOS host (Development mode) + +If you are on OSX, you'll first need to install Vagrant, VirtualBox, and CoreOS. See [CoreOS on Vagrant](https://coreos.com/docs/running-coreos/platforms/vagrant/) for instructions. + +Here's what will be created: + + + + ┌─────────────────────────────────────────────────────────┐ + │ CoreOS Host │ + │ ┌──────────────────────────┐ ┌─────────────────────┐ │ + │ │ Docker Container │ │ Docker Container │ │ + │ │ ┌───────────────────┐ │ │ ┌────────────┐ │ │ + │ │ │ Elastic Thought │ │ │ │Sync Gateway│ │ │ + │ │ │ Server │ │ │ │ Database │ │ │ + │ │ │ ┌───────────┐ │ │ │ │ │ │ │ + │ │ │ │In-process │ │◀─┼──┼───▶│ │ │ │ + │ │ │ │ Caffe │ │ │ │ │ │ │ │ + │ │ │ │ worker │ │ │ │ │ │ │ │ + │ │ │ └───────────┘ │ │ │ └────────────┘ │ │ + │ │ └───────────────────┘ │ └─────────────────────┘ │ + │ └──────────────────────────┘ │ + └─────────────────────────────────────────────────────────┘ + + +``` +$ vagrant ssh core-01 +$ docker run --name sync-gateway -P couchbase/sync-gateway sync-gw-start -c feature/forestdb_bucket -g https://fixme.com +$ docker run --name elastic-thought -P --link sync-gateway:sync-gateway tleyden5iwx/elastic-thought-cpu-develop bash -c 'refresh-elastic-thought; elastic-thought' +``` + + +## Installing elastic-thought on Vagrant ### Update Vagrant @@ -133,71 +176,103 @@ $ vagrant -v 1.7.1 ``` -### Install CoreOS +### Install CoreOS on Vagrant -See https://coreos.com/docs/running-coreos/platforms/vagrant/ +Clone the coreos/vagrant fork that has been customized for running ElasticThought. -### Update cloud-config +``` +$ cd ~/Vagrant +$ git clone git@github.com:tleyden/coreos-vagrant.git +$ cd coreos-vagrant +$ cp config.rb.sample config.rb +$ cp user-data.sample user-data +``` + +By default this will run a **two node** cluster, if you want to change this, update the `$num_instances` variable in the `config.rb` file. -Open the user-data file, and add: +### Run CoreOS ``` -write_files: - - path: /etc/systemd/system/docker.service.d/increase-ulimit.conf - owner: core:core - permissions: 0644 - content: | - [Service] - LimitMEMLOCK=infinity - - path: /var/lib/couchbase/data/.README - owner: core:core - permissions: 0644 - content: | - Couchbase Data files are stored here - - path: /var/lib/couchbase/index/.README - owner: core:core - permissions: 0644 - content: | - Couchbase Index files are stored here - - path: /var/lib/cbfs/data/.README - owner: core:core - permissions: 0644 - content: | - CBFS files are stored here +$ vagrant up ``` -### Increase RAM size of VM's +Ssh in: -Couchbase Server wants a lot of RAM. Bump up the vm memory size to 2GB. +``` +$ vagrant ssh core-01 -- -A +``` -Edit your Vagrantfile: +If you see: ``` -$vb_memory = 2048 +Failed Units: 1 + user-cloudinit@var-lib-coreos\x2dvagrant-vagrantfile\x2duser\x2ddata.service ``` -### Setup port forwarding for Couchbase UI (optional) +Jump to **Workaround CoreOS + Vagrant issues** below. -This is only needed if you want to be able to connect to the Couchbase web UI from a browser on your host OS (ie, OSX) +Verify things started up correctly: -Add the following snippet to your Vagrant file: +``` +core@core-01 ~ $ fleectctl list-machines +``` + +If you get errors like: ``` -if i == 1 - # create a port forward mapping to view couchbase web ui - config.vm.network "forwarded_port", guest: 8091, host: 5091 -end +2015/03/26 16:58:50 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused +2015/03/26 16:58:50 ERROR client.go:213: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms ``` -### Disable Transparent Huge Pages (optional) +Jump to **Workaround CoreOS + Vagrant issues** below. -Not sure how crucial this is, but I'll mention it just in case. After the CoreOS machines startup, ssh into each one: +### Workaround CoreOS + Vagrant issues: + +First exit out of CoreOS: ``` -$ sudo bash -# echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag +core@core-01 ~ $ exit ``` +On your OSX workstation, try the following workaround: + +``` +$ sed -i '' 's/420/0644/' user-data +$ sed -i '' 's/484/0744/' user-data +$ vagrant reload --provision +``` + +Ssh back in: + +``` +$ vagrant ssh core-01 -- -A +``` + +Verify it worked: + +``` +core@core-01 ~ $ fleectctl list-machines +``` + +You should see: + +``` +MACHINE IP METADATA +ce0fec18... 172.17.8.102 - +d6402b24... 172.17.8.101 - +``` + +I filed [CoreOS cloudinit issue 328](https://github.com/coreos/coreos-cloudinit/issues/328) to figure out why this error is happening (possibly related issues: [CoreOS cloudinit issue 261](https://github.com/coreos/coreos-cloudinit/issues/261) or [CoreOS cloudinit issue 190](https://github.com/coreos/bugs/issues/190)) + + +### Continue steps above + +Scroll up to the **Installing elastic-thought on AWS** section and start with **Verify CoreOS cluster** + +## FAQ + +* Is this useful for grid computing / distributed computation? **Ans**: No, this is not trying to be a grid computing (aka distributed computation) solution. You may want to check out [Caffe Issue 876](https://github.com/BVLC/caffe/issues/876) or [ParameterServer](http://parameterserver.org/) + ## License Apache 2 diff --git a/docker/gpu/develop/Dockerfile b/docker/gpu/develop/Dockerfile index d7b90aa..4e90e80 100644 --- a/docker/gpu/develop/Dockerfile +++ b/docker/gpu/develop/Dockerfile @@ -4,34 +4,30 @@ FROM tleyden5iwx/caffe-gpu-master MAINTAINER Traun Leyden tleyden@couchbase.com ENV GOPATH /opt/go -ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH ENV GOROOT /usr/local/go +ENV PATH $PATH:$GOPATH/bin:$GOROOT/bin +# Get dependencies RUN apt-get update && \ - apt-get -q -y install mercurial && \ - apt-get -q -y install make && \ - apt-get -q -y install binutils && \ - apt-get -q -y install bison && \ - apt-get -q -y install build-essential + apt-get -q -y install \ + mercurial \ + make \ + binutils \ + bison \ + build-essential RUN mkdir -p $GOPATH -# Install Go 1.3 manually (since Go 1.3 is required, and ubuntu 14.04 still uses Go 1.2) -RUN curl -O https://storage.googleapis.com/golang/go1.3.1.linux-amd64.tar.gz && \ - tar -C /usr/local -xzf go1.3.1.linux-amd64.tar.gz +# Download and install Go 1.4 +RUN wget http://golang.org/dl/go1.4.2.linux-amd64.tar.gz && \ + tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz && \ + rm go1.4.2.linux-amd64.tar.gz # Add refresh script ADD scripts/refresh-elastic-thought /usr/local/bin/ ADD scripts/refresh-elastic-thought-refresher /usr/local/bin/ # Go get ElasticThought -RUN go get -u -v -t github.com/tleyden/elastic-thought && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/httpd && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/worker && \ +RUN go get -u -v -t github.com/tleyden/elastic-thought/...&& \ cd $GOPATH/src/github.com/tleyden/elastic-thought && \ git log -3 - -# Copy binaries -RUN cp /opt/go/bin/worker /usr/local/bin && \ - cp /opt/go/bin/httpd /usr/local/bin - diff --git a/docker/gpu/develop/README.md b/docker/gpu/develop/README.md index dd8fd3d..a7993be 100644 --- a/docker/gpu/develop/README.md +++ b/docker/gpu/develop/README.md @@ -1,4 +1,4 @@ -[![Join the chat at https://gitter.im/tleyden/elastic-thought](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tleyden/elastic-thought?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) +[![Build Status](https://drone.io/github.com/tleyden/elastic-thought/status.png)](https://drone.io/github.com/tleyden/elastic-thought/latest) [![GoDoc](https://godoc.org/github.com/tleyden/elastic-thought?status.png)](https://godoc.org/github.com/tleyden/elastic-thought) [![Coverage Status](https://coveralls.io/repos/tleyden/elastic-thought/badge.svg?branch=master)](https://coveralls.io/r/tleyden/elastic-thought?branch=master) [![Join the chat at https://gitter.im/tleyden/elastic-thought](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tleyden/elastic-thought?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) Scalable REST API wrapper for the [Caffe](http://caffe.berkeleyvision.org) deep learning framework. @@ -36,15 +36,15 @@ If running on AWS, each [CoreOS](https://coreos.com/) instance would be running Although not shown, all components would be running inside of [Docker](https://www.docker.com/) containers. -[CoreOS Fleet](https://coreos.com/docs/launching-containers/launching/launching-containers-fleet/) would be leveraged to auto-restart any failed components, including Caffe workers. +It would be possible to start more nodes which only had Caffe GPU workers running. ## Roadmap *Current Status: everything under heavy construction, not ready for public consumption yet* 1. **[done]** Working end-to-end with IMAGE_DATA caffe layer using a single test set with a single training set, and ability to query trained set. -1. **[in progress]** ---> Support LEVELDB / LMDB data formats, to run mnist example. -1. Support the majority of caffe use cases +1. **[done]** Support LEVELDB / LMDB data formats, to run mnist example. +1. **[in progress]** Support the majority of caffe use cases 1. Package everything up to make it easy to deploy <-- initial release 1. Ability to auto-scale worker instances up and down based on how many jobs are in the message queue. 1. Attempt to add support for other deep learning frameworks: pylearn2, cuda-convnet, etc. @@ -63,17 +63,18 @@ Although not shown, all components would be running inside of [Docker](https://w * [REST API](http://docs.elasticthought.apiary.io/) * [Godocs](http://godoc.org/github.com/tleyden/elastic-thought) +* This README -## Grid Computing +## System Requirements -ElasticThought is not trying to be a grid computing (aka distributed computation) solution. +ElasticThought requires CoreOS to run. -For that, check out: +If you want to access the GPU, you will need to do extra work to get [CoreOS working with Nvidia CUDA GPU Drivers](http://tleyden.github.io/blog/2014/11/04/coreos-with-nvidia-cuda-gpu-drivers/) -* [ParameterServer](http://parameterserver.org/) -* [Caffe Issue 876](https://github.com/BVLC/caffe/issues/876) -## Kick things off: Aws +## Installing elastic-thought on AWS (Production mode) + +It should be possible to install elastic-thought anywhere that CoreOS is supported. Currently, there are instructions for AWS and Vagrant (below). ### Launch EC2 instances via CloudFormation script @@ -83,6 +84,16 @@ For that, check out: * Choose 3 node cluster with m3.medium or g2.2xlarge (GPU case) instance type * All other values should be default +### Verify CoreOS cluster + +Run: + +``` +$ fleetctl list-machines +``` + +Which should show all the CoreOS machines in your cluster. (this uses etcd under the hood, so will also validate that etcd is setup correctly). + ### Kick off ElasticThought Ssh into one of the machines (doesn't matter which): `ssh -A core@ec2-54-160-96-153.compute-1.amazonaws.com` @@ -99,30 +110,62 @@ It should look like this: ``` UNIT MACHINE ACTIVE SUB -cbfs_announce@1.service 2340c553.../10.225.17.229 active running -cbfs_announce@2.service fbd4562e.../10.182.197.145 active running -cbfs_announce@3.service 0f5e2e11.../10.168.212.210 active running -cbfs_node@1.service 2340c553.../10.225.17.229 active running -cbfs_node@2.service fbd4562e.../10.182.197.145 active running -cbfs_node@3.service 0f5e2e11.../10.168.212.210 active running -couchbase_bootstrap_node.service 0f5e2e11.../10.168.212.210 active running -couchbase_bootstrap_node_announce.service 0f5e2e11.../10.168.212.210 active running -couchbase_node.1.service 2340c553.../10.225.17.229 active running -couchbase_node.2.service fbd4562e.../10.182.197.145 active running -elastic_thought_gpu@1.service 2340c553.../10.225.17.229 active running -elastic_thought_gpu@2.service fbd4562e.../10.182.197.145 active running -elastic_thought_gpu@3.service 0f5e2e11.../10.168.212.210 active running -sync_gw_announce@1.service 2340c553.../10.225.17.229 active running -sync_gw_announce@2.service fbd4562e.../10.182.197.145 active running -sync_gw_announce@3.service 0f5e2e11.../10.168.212.210 active running -sync_gw_node@1.service 2340c553.../10.225.17.229 active running -sync_gw_node@2.service fbd4562e.../10.182.197.145 active running -sync_gw_node@3.service 0f5e2e11.../10.168.212.210 active running +cbfs_announce@1.service 2340c553.../10.225.17.229 active running +cbfs_announce@2.service fbd4562e.../10.182.197.145 active running +cbfs_announce@3.service 0f5e2e11.../10.168.212.210 active running +cbfs_node@1.service 2340c553.../10.225.17.229 active running +cbfs_node@2.service fbd4562e.../10.182.197.145 active running +cbfs_node@3.service 0f5e2e11.../10.168.212.210 active running +couchbase_bootstrap_node.service 0f5e2e11.../10.168.212.210 active running +couchbase_bootstrap_node_announce.service 0f5e2e11.../10.168.212.210 active running +couchbase_node.1.service 2340c553.../10.225.17.229 active running +couchbase_node.2.service fbd4562e.../10.182.197.145 active running +elastic_thought_gpu@1.service 2340c553.../10.225.17.229 active running +elastic_thought_gpu@2.service fbd4562e.../10.182.197.145 active running +elastic_thought_gpu@3.service 0f5e2e11.../10.168.212.210 active running +sync_gw_announce@1.service 2340c553.../10.225.17.229 active running +sync_gw_announce@2.service fbd4562e.../10.182.197.145 active running +sync_gw_announce@3.service 0f5e2e11.../10.168.212.210 active running +sync_gw_node@1.service 2340c553.../10.225.17.229 active running +sync_gw_node@2.service fbd4562e.../10.182.197.145 active running +sync_gw_node@3.service 0f5e2e11.../10.168.212.210 active running ``` At this point you should be able to access the [REST API](http://docs.elasticthought.apiary.io/) on the public ip any of the three Sync Gateway machines. -## Kick things off: Vagrant +## Installing elastic-thought on a single CoreOS host (Development mode) + +If you are on OSX, you'll first need to install Vagrant, VirtualBox, and CoreOS. See [CoreOS on Vagrant](https://coreos.com/docs/running-coreos/platforms/vagrant/) for instructions. + +Here's what will be created: + + + + ┌─────────────────────────────────────────────────────────┐ + │ CoreOS Host │ + │ ┌──────────────────────────┐ ┌─────────────────────┐ │ + │ │ Docker Container │ │ Docker Container │ │ + │ │ ┌───────────────────┐ │ │ ┌────────────┐ │ │ + │ │ │ Elastic Thought │ │ │ │Sync Gateway│ │ │ + │ │ │ Server │ │ │ │ Database │ │ │ + │ │ │ ┌───────────┐ │ │ │ │ │ │ │ + │ │ │ │In-process │ │◀─┼──┼───▶│ │ │ │ + │ │ │ │ Caffe │ │ │ │ │ │ │ │ + │ │ │ │ worker │ │ │ │ │ │ │ │ + │ │ │ └───────────┘ │ │ │ └────────────┘ │ │ + │ │ └───────────────────┘ │ └─────────────────────┘ │ + │ └──────────────────────────┘ │ + └─────────────────────────────────────────────────────────┘ + + +``` +$ vagrant ssh core-01 +$ docker run --name sync-gateway -P couchbase/sync-gateway sync-gw-start -c feature/forestdb_bucket -g https://fixme.com +$ docker run --name elastic-thought -P --link sync-gateway:sync-gateway tleyden5iwx/elastic-thought-cpu-develop bash -c 'refresh-elastic-thought; elastic-thought' +``` + + +## Installing elastic-thought on Vagrant ### Update Vagrant @@ -133,71 +176,103 @@ $ vagrant -v 1.7.1 ``` -### Install CoreOS +### Install CoreOS on Vagrant -See https://coreos.com/docs/running-coreos/platforms/vagrant/ +Clone the coreos/vagrant fork that has been customized for running ElasticThought. -### Update cloud-config +``` +$ cd ~/Vagrant +$ git clone git@github.com:tleyden/coreos-vagrant.git +$ cd coreos-vagrant +$ cp config.rb.sample config.rb +$ cp user-data.sample user-data +``` + +By default this will run a **two node** cluster, if you want to change this, update the `$num_instances` variable in the `config.rb` file. -Open the user-data file, and add: +### Run CoreOS ``` -write_files: - - path: /etc/systemd/system/docker.service.d/increase-ulimit.conf - owner: core:core - permissions: 0644 - content: | - [Service] - LimitMEMLOCK=infinity - - path: /var/lib/couchbase/data/.README - owner: core:core - permissions: 0644 - content: | - Couchbase Data files are stored here - - path: /var/lib/couchbase/index/.README - owner: core:core - permissions: 0644 - content: | - Couchbase Index files are stored here - - path: /var/lib/cbfs/data/.README - owner: core:core - permissions: 0644 - content: | - CBFS files are stored here +$ vagrant up ``` -### Increase RAM size of VM's +Ssh in: -Couchbase Server wants a lot of RAM. Bump up the vm memory size to 2GB. +``` +$ vagrant ssh core-01 -- -A +``` -Edit your Vagrantfile: +If you see: ``` -$vb_memory = 2048 +Failed Units: 1 + user-cloudinit@var-lib-coreos\x2dvagrant-vagrantfile\x2duser\x2ddata.service ``` -### Setup port forwarding for Couchbase UI (optional) +Jump to **Workaround CoreOS + Vagrant issues** below. -This is only needed if you want to be able to connect to the Couchbase web UI from a browser on your host OS (ie, OSX) +Verify things started up correctly: -Add the following snippet to your Vagrant file: +``` +core@core-01 ~ $ fleectctl list-machines +``` + +If you get errors like: ``` -if i == 1 - # create a port forward mapping to view couchbase web ui - config.vm.network "forwarded_port", guest: 8091, host: 5091 -end +2015/03/26 16:58:50 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused +2015/03/26 16:58:50 ERROR client.go:213: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms ``` -### Disable Transparent Huge Pages (optional) +Jump to **Workaround CoreOS + Vagrant issues** below. -Not sure how crucial this is, but I'll mention it just in case. After the CoreOS machines startup, ssh into each one: +### Workaround CoreOS + Vagrant issues: + +First exit out of CoreOS: ``` -$ sudo bash -# echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag +core@core-01 ~ $ exit ``` +On your OSX workstation, try the following workaround: + +``` +$ sed -i '' 's/420/0644/' user-data +$ sed -i '' 's/484/0744/' user-data +$ vagrant reload --provision +``` + +Ssh back in: + +``` +$ vagrant ssh core-01 -- -A +``` + +Verify it worked: + +``` +core@core-01 ~ $ fleectctl list-machines +``` + +You should see: + +``` +MACHINE IP METADATA +ce0fec18... 172.17.8.102 - +d6402b24... 172.17.8.101 - +``` + +I filed [CoreOS cloudinit issue 328](https://github.com/coreos/coreos-cloudinit/issues/328) to figure out why this error is happening (possibly related issues: [CoreOS cloudinit issue 261](https://github.com/coreos/coreos-cloudinit/issues/261) or [CoreOS cloudinit issue 190](https://github.com/coreos/bugs/issues/190)) + + +### Continue steps above + +Scroll up to the **Installing elastic-thought on AWS** section and start with **Verify CoreOS cluster** + +## FAQ + +* Is this useful for grid computing / distributed computation? **Ans**: No, this is not trying to be a grid computing (aka distributed computation) solution. You may want to check out [Caffe Issue 876](https://github.com/BVLC/caffe/issues/876) or [ParameterServer](http://parameterserver.org/) + ## License Apache 2 diff --git a/docker/gpu/master/Dockerfile b/docker/gpu/master/Dockerfile index d7b90aa..4e90e80 100644 --- a/docker/gpu/master/Dockerfile +++ b/docker/gpu/master/Dockerfile @@ -4,34 +4,30 @@ FROM tleyden5iwx/caffe-gpu-master MAINTAINER Traun Leyden tleyden@couchbase.com ENV GOPATH /opt/go -ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH ENV GOROOT /usr/local/go +ENV PATH $PATH:$GOPATH/bin:$GOROOT/bin +# Get dependencies RUN apt-get update && \ - apt-get -q -y install mercurial && \ - apt-get -q -y install make && \ - apt-get -q -y install binutils && \ - apt-get -q -y install bison && \ - apt-get -q -y install build-essential + apt-get -q -y install \ + mercurial \ + make \ + binutils \ + bison \ + build-essential RUN mkdir -p $GOPATH -# Install Go 1.3 manually (since Go 1.3 is required, and ubuntu 14.04 still uses Go 1.2) -RUN curl -O https://storage.googleapis.com/golang/go1.3.1.linux-amd64.tar.gz && \ - tar -C /usr/local -xzf go1.3.1.linux-amd64.tar.gz +# Download and install Go 1.4 +RUN wget http://golang.org/dl/go1.4.2.linux-amd64.tar.gz && \ + tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz && \ + rm go1.4.2.linux-amd64.tar.gz # Add refresh script ADD scripts/refresh-elastic-thought /usr/local/bin/ ADD scripts/refresh-elastic-thought-refresher /usr/local/bin/ # Go get ElasticThought -RUN go get -u -v -t github.com/tleyden/elastic-thought && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/httpd && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/worker && \ +RUN go get -u -v -t github.com/tleyden/elastic-thought/...&& \ cd $GOPATH/src/github.com/tleyden/elastic-thought && \ git log -3 - -# Copy binaries -RUN cp /opt/go/bin/worker /usr/local/bin && \ - cp /opt/go/bin/httpd /usr/local/bin - diff --git a/docker/gpu/master/README.md b/docker/gpu/master/README.md index dd8fd3d..a7993be 100644 --- a/docker/gpu/master/README.md +++ b/docker/gpu/master/README.md @@ -1,4 +1,4 @@ -[![Join the chat at https://gitter.im/tleyden/elastic-thought](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tleyden/elastic-thought?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) +[![Build Status](https://drone.io/github.com/tleyden/elastic-thought/status.png)](https://drone.io/github.com/tleyden/elastic-thought/latest) [![GoDoc](https://godoc.org/github.com/tleyden/elastic-thought?status.png)](https://godoc.org/github.com/tleyden/elastic-thought) [![Coverage Status](https://coveralls.io/repos/tleyden/elastic-thought/badge.svg?branch=master)](https://coveralls.io/r/tleyden/elastic-thought?branch=master) [![Join the chat at https://gitter.im/tleyden/elastic-thought](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/tleyden/elastic-thought?utm_source=badge&utm_medium=badge&utm_campaign=pr-badge&utm_content=badge) Scalable REST API wrapper for the [Caffe](http://caffe.berkeleyvision.org) deep learning framework. @@ -36,15 +36,15 @@ If running on AWS, each [CoreOS](https://coreos.com/) instance would be running Although not shown, all components would be running inside of [Docker](https://www.docker.com/) containers. -[CoreOS Fleet](https://coreos.com/docs/launching-containers/launching/launching-containers-fleet/) would be leveraged to auto-restart any failed components, including Caffe workers. +It would be possible to start more nodes which only had Caffe GPU workers running. ## Roadmap *Current Status: everything under heavy construction, not ready for public consumption yet* 1. **[done]** Working end-to-end with IMAGE_DATA caffe layer using a single test set with a single training set, and ability to query trained set. -1. **[in progress]** ---> Support LEVELDB / LMDB data formats, to run mnist example. -1. Support the majority of caffe use cases +1. **[done]** Support LEVELDB / LMDB data formats, to run mnist example. +1. **[in progress]** Support the majority of caffe use cases 1. Package everything up to make it easy to deploy <-- initial release 1. Ability to auto-scale worker instances up and down based on how many jobs are in the message queue. 1. Attempt to add support for other deep learning frameworks: pylearn2, cuda-convnet, etc. @@ -63,17 +63,18 @@ Although not shown, all components would be running inside of [Docker](https://w * [REST API](http://docs.elasticthought.apiary.io/) * [Godocs](http://godoc.org/github.com/tleyden/elastic-thought) +* This README -## Grid Computing +## System Requirements -ElasticThought is not trying to be a grid computing (aka distributed computation) solution. +ElasticThought requires CoreOS to run. -For that, check out: +If you want to access the GPU, you will need to do extra work to get [CoreOS working with Nvidia CUDA GPU Drivers](http://tleyden.github.io/blog/2014/11/04/coreos-with-nvidia-cuda-gpu-drivers/) -* [ParameterServer](http://parameterserver.org/) -* [Caffe Issue 876](https://github.com/BVLC/caffe/issues/876) -## Kick things off: Aws +## Installing elastic-thought on AWS (Production mode) + +It should be possible to install elastic-thought anywhere that CoreOS is supported. Currently, there are instructions for AWS and Vagrant (below). ### Launch EC2 instances via CloudFormation script @@ -83,6 +84,16 @@ For that, check out: * Choose 3 node cluster with m3.medium or g2.2xlarge (GPU case) instance type * All other values should be default +### Verify CoreOS cluster + +Run: + +``` +$ fleetctl list-machines +``` + +Which should show all the CoreOS machines in your cluster. (this uses etcd under the hood, so will also validate that etcd is setup correctly). + ### Kick off ElasticThought Ssh into one of the machines (doesn't matter which): `ssh -A core@ec2-54-160-96-153.compute-1.amazonaws.com` @@ -99,30 +110,62 @@ It should look like this: ``` UNIT MACHINE ACTIVE SUB -cbfs_announce@1.service 2340c553.../10.225.17.229 active running -cbfs_announce@2.service fbd4562e.../10.182.197.145 active running -cbfs_announce@3.service 0f5e2e11.../10.168.212.210 active running -cbfs_node@1.service 2340c553.../10.225.17.229 active running -cbfs_node@2.service fbd4562e.../10.182.197.145 active running -cbfs_node@3.service 0f5e2e11.../10.168.212.210 active running -couchbase_bootstrap_node.service 0f5e2e11.../10.168.212.210 active running -couchbase_bootstrap_node_announce.service 0f5e2e11.../10.168.212.210 active running -couchbase_node.1.service 2340c553.../10.225.17.229 active running -couchbase_node.2.service fbd4562e.../10.182.197.145 active running -elastic_thought_gpu@1.service 2340c553.../10.225.17.229 active running -elastic_thought_gpu@2.service fbd4562e.../10.182.197.145 active running -elastic_thought_gpu@3.service 0f5e2e11.../10.168.212.210 active running -sync_gw_announce@1.service 2340c553.../10.225.17.229 active running -sync_gw_announce@2.service fbd4562e.../10.182.197.145 active running -sync_gw_announce@3.service 0f5e2e11.../10.168.212.210 active running -sync_gw_node@1.service 2340c553.../10.225.17.229 active running -sync_gw_node@2.service fbd4562e.../10.182.197.145 active running -sync_gw_node@3.service 0f5e2e11.../10.168.212.210 active running +cbfs_announce@1.service 2340c553.../10.225.17.229 active running +cbfs_announce@2.service fbd4562e.../10.182.197.145 active running +cbfs_announce@3.service 0f5e2e11.../10.168.212.210 active running +cbfs_node@1.service 2340c553.../10.225.17.229 active running +cbfs_node@2.service fbd4562e.../10.182.197.145 active running +cbfs_node@3.service 0f5e2e11.../10.168.212.210 active running +couchbase_bootstrap_node.service 0f5e2e11.../10.168.212.210 active running +couchbase_bootstrap_node_announce.service 0f5e2e11.../10.168.212.210 active running +couchbase_node.1.service 2340c553.../10.225.17.229 active running +couchbase_node.2.service fbd4562e.../10.182.197.145 active running +elastic_thought_gpu@1.service 2340c553.../10.225.17.229 active running +elastic_thought_gpu@2.service fbd4562e.../10.182.197.145 active running +elastic_thought_gpu@3.service 0f5e2e11.../10.168.212.210 active running +sync_gw_announce@1.service 2340c553.../10.225.17.229 active running +sync_gw_announce@2.service fbd4562e.../10.182.197.145 active running +sync_gw_announce@3.service 0f5e2e11.../10.168.212.210 active running +sync_gw_node@1.service 2340c553.../10.225.17.229 active running +sync_gw_node@2.service fbd4562e.../10.182.197.145 active running +sync_gw_node@3.service 0f5e2e11.../10.168.212.210 active running ``` At this point you should be able to access the [REST API](http://docs.elasticthought.apiary.io/) on the public ip any of the three Sync Gateway machines. -## Kick things off: Vagrant +## Installing elastic-thought on a single CoreOS host (Development mode) + +If you are on OSX, you'll first need to install Vagrant, VirtualBox, and CoreOS. See [CoreOS on Vagrant](https://coreos.com/docs/running-coreos/platforms/vagrant/) for instructions. + +Here's what will be created: + + + + ┌─────────────────────────────────────────────────────────┐ + │ CoreOS Host │ + │ ┌──────────────────────────┐ ┌─────────────────────┐ │ + │ │ Docker Container │ │ Docker Container │ │ + │ │ ┌───────────────────┐ │ │ ┌────────────┐ │ │ + │ │ │ Elastic Thought │ │ │ │Sync Gateway│ │ │ + │ │ │ Server │ │ │ │ Database │ │ │ + │ │ │ ┌───────────┐ │ │ │ │ │ │ │ + │ │ │ │In-process │ │◀─┼──┼───▶│ │ │ │ + │ │ │ │ Caffe │ │ │ │ │ │ │ │ + │ │ │ │ worker │ │ │ │ │ │ │ │ + │ │ │ └───────────┘ │ │ │ └────────────┘ │ │ + │ │ └───────────────────┘ │ └─────────────────────┘ │ + │ └──────────────────────────┘ │ + └─────────────────────────────────────────────────────────┘ + + +``` +$ vagrant ssh core-01 +$ docker run --name sync-gateway -P couchbase/sync-gateway sync-gw-start -c feature/forestdb_bucket -g https://fixme.com +$ docker run --name elastic-thought -P --link sync-gateway:sync-gateway tleyden5iwx/elastic-thought-cpu-develop bash -c 'refresh-elastic-thought; elastic-thought' +``` + + +## Installing elastic-thought on Vagrant ### Update Vagrant @@ -133,71 +176,103 @@ $ vagrant -v 1.7.1 ``` -### Install CoreOS +### Install CoreOS on Vagrant -See https://coreos.com/docs/running-coreos/platforms/vagrant/ +Clone the coreos/vagrant fork that has been customized for running ElasticThought. -### Update cloud-config +``` +$ cd ~/Vagrant +$ git clone git@github.com:tleyden/coreos-vagrant.git +$ cd coreos-vagrant +$ cp config.rb.sample config.rb +$ cp user-data.sample user-data +``` + +By default this will run a **two node** cluster, if you want to change this, update the `$num_instances` variable in the `config.rb` file. -Open the user-data file, and add: +### Run CoreOS ``` -write_files: - - path: /etc/systemd/system/docker.service.d/increase-ulimit.conf - owner: core:core - permissions: 0644 - content: | - [Service] - LimitMEMLOCK=infinity - - path: /var/lib/couchbase/data/.README - owner: core:core - permissions: 0644 - content: | - Couchbase Data files are stored here - - path: /var/lib/couchbase/index/.README - owner: core:core - permissions: 0644 - content: | - Couchbase Index files are stored here - - path: /var/lib/cbfs/data/.README - owner: core:core - permissions: 0644 - content: | - CBFS files are stored here +$ vagrant up ``` -### Increase RAM size of VM's +Ssh in: -Couchbase Server wants a lot of RAM. Bump up the vm memory size to 2GB. +``` +$ vagrant ssh core-01 -- -A +``` -Edit your Vagrantfile: +If you see: ``` -$vb_memory = 2048 +Failed Units: 1 + user-cloudinit@var-lib-coreos\x2dvagrant-vagrantfile\x2duser\x2ddata.service ``` -### Setup port forwarding for Couchbase UI (optional) +Jump to **Workaround CoreOS + Vagrant issues** below. -This is only needed if you want to be able to connect to the Couchbase web UI from a browser on your host OS (ie, OSX) +Verify things started up correctly: -Add the following snippet to your Vagrant file: +``` +core@core-01 ~ $ fleectctl list-machines +``` + +If you get errors like: ``` -if i == 1 - # create a port forward mapping to view couchbase web ui - config.vm.network "forwarded_port", guest: 8091, host: 5091 -end +2015/03/26 16:58:50 INFO client.go:291: Failed getting response from http://127.0.0.1:4001/: dial tcp 127.0.0.1:4001: connection refused +2015/03/26 16:58:50 ERROR client.go:213: Unable to get result for {Get /_coreos.com/fleet/machines}, retrying in 100ms ``` -### Disable Transparent Huge Pages (optional) +Jump to **Workaround CoreOS + Vagrant issues** below. -Not sure how crucial this is, but I'll mention it just in case. After the CoreOS machines startup, ssh into each one: +### Workaround CoreOS + Vagrant issues: + +First exit out of CoreOS: ``` -$ sudo bash -# echo never > /sys/kernel/mm/transparent_hugepage/enabled && echo never > /sys/kernel/mm/transparent_hugepage/defrag +core@core-01 ~ $ exit ``` +On your OSX workstation, try the following workaround: + +``` +$ sed -i '' 's/420/0644/' user-data +$ sed -i '' 's/484/0744/' user-data +$ vagrant reload --provision +``` + +Ssh back in: + +``` +$ vagrant ssh core-01 -- -A +``` + +Verify it worked: + +``` +core@core-01 ~ $ fleectctl list-machines +``` + +You should see: + +``` +MACHINE IP METADATA +ce0fec18... 172.17.8.102 - +d6402b24... 172.17.8.101 - +``` + +I filed [CoreOS cloudinit issue 328](https://github.com/coreos/coreos-cloudinit/issues/328) to figure out why this error is happening (possibly related issues: [CoreOS cloudinit issue 261](https://github.com/coreos/coreos-cloudinit/issues/261) or [CoreOS cloudinit issue 190](https://github.com/coreos/bugs/issues/190)) + + +### Continue steps above + +Scroll up to the **Installing elastic-thought on AWS** section and start with **Verify CoreOS cluster** + +## FAQ + +* Is this useful for grid computing / distributed computation? **Ans**: No, this is not trying to be a grid computing (aka distributed computation) solution. You may want to check out [Caffe Issue 876](https://github.com/BVLC/caffe/issues/876) or [ParameterServer](http://parameterserver.org/) + ## License Apache 2 diff --git a/docker/templates/Dockerfile.template b/docker/templates/Dockerfile.template index 8cc7a23..01463d9 100644 --- a/docker/templates/Dockerfile.template +++ b/docker/templates/Dockerfile.template @@ -4,34 +4,30 @@ FROM tleyden5iwx/caffe-{{ .ProcessorType }}-master MAINTAINER Traun Leyden tleyden@couchbase.com ENV GOPATH /opt/go -ENV PATH $GOPATH/bin:/usr/local/go/bin:$PATH ENV GOROOT /usr/local/go +ENV PATH $PATH:$GOPATH/bin:$GOROOT/bin +# Get dependencies RUN apt-get update && \ - apt-get -q -y install mercurial && \ - apt-get -q -y install make && \ - apt-get -q -y install binutils && \ - apt-get -q -y install bison && \ - apt-get -q -y install build-essential + apt-get -q -y install \ + mercurial \ + make \ + binutils \ + bison \ + build-essential RUN mkdir -p $GOPATH -# Install Go 1.3 manually (since Go 1.3 is required, and ubuntu 14.04 still uses Go 1.2) -RUN curl -O https://storage.googleapis.com/golang/go1.3.1.linux-amd64.tar.gz && \ - tar -C /usr/local -xzf go1.3.1.linux-amd64.tar.gz +# Download and install Go 1.4 +RUN wget http://golang.org/dl/go1.4.2.linux-amd64.tar.gz && \ + tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz && \ + rm go1.4.2.linux-amd64.tar.gz # Add refresh script ADD scripts/refresh-elastic-thought /usr/local/bin/ ADD scripts/refresh-elastic-thought-refresher /usr/local/bin/ # Go get ElasticThought -RUN go get -u -v -t github.com/tleyden/elastic-thought && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/httpd && \ - go get -u -v -t github.com/tleyden/elastic-thought/cli/worker && \ +RUN go get -u -v -t github.com/tleyden/elastic-thought/...&& \ cd $GOPATH/src/github.com/tleyden/elastic-thought && \ git log -3 - -# Copy binaries -RUN cp /opt/go/bin/worker /usr/local/bin && \ - cp /opt/go/bin/httpd /usr/local/bin -