## Elasticsearch when you need it
Elasticsearch is a powerful tool for searching through and aggregating over disjoint, unstructured data.  We find it especially useful in exploring large data sets.  Its flexibility and variety of aggregations help us to quickly identify interesting patterns that we may like to investigate further.  

As a search server, Elasticsearch's power lies in its distributed nature.  Horizontal scaling allows us to perform complex searches and aggregations over millions of documents in a few seconds.  This comes at a cost, though - of the real, financial variety. Running a large Elasticsearch cluster on a PaaS can incur significant expenses, as you are essentially renting computing power. 

You don't want to pay for machines when they are not in use. For example, if you are using Elasticsearch during regular work days, you would ideally not want to incur the costs of machines sitting there idly in between.  We have found, fortunately, that we can save significant amounts of money by taking advantage of Docker technology managed by Kubernetes in Google Cloud.  If you haven't heard of, or tried, Kubernetes, check it out!  We recently made the switch from Tutum, and we're a lot happier now.  

It turns out that deploying a Dockerized Elasticsearch cluster in Kubernetes is actually pretty easy.  Additionally, bringing up the cluster and tearing it down when it is not in use can be done with a couple of scripts.  Leveraging Elasticsearch 2.3's backup and restore features, we can save our data to an HDFS storage solution and restore it upon creation of the cluster.  This post will discuss creating a Kubernetes cluster in Google Cloud, deploying our Dockerized Elasticsearch cluster within Kubernetes, and backing up and restoring from Google Storage's HDFS service.

## Install gcloud and kubernetes
You will first need to make sure you have the gcloud command line utility installed, along with kubectl.  If you already have gcloud, you can install kubectl using the command:
```
glcoud components install kubectl
```

In [None]:
!gcloud config set project elasticsearch-project
!gcloud container clusters get-credentials elasticsearch-kubernetes

<img src=https://zwischenzugs.files.wordpress.com/2015/08/pods.jpg width=640 height=480/>

# Pods
## What is a pod?
A pod (as in a pod of whales or pea pod) is a group of one or more containers (such as Docker containers), the shared storage for those containers, and options about how to run the containers. 
A pod can be defined on its own without a service that ensures its replacement if it goes down.  More often, though, it is described in a **Deployment** or **Replication Controller** that try to maintain a given number of replica pods at all times.

### Important info contained in the Deployment/Pod Template:
- Number of replicas
- Description of containers:
    - Image name
    - Ports to expose
        - containerPort exposes port on container. If 'port' (pod's port) is not defined, it is set to containerPort
    - Environment variables
- ServiceAccount: provides identity for processes that run in a pod
- Readiness probe: an endpoint that should return 200 when the pod is ready


In [None]:
!cat kubernetes/testing/recommendation-engine/deploy.yml

In [None]:
!kubectl get deployments

In [None]:
!kubectl describe deployment semantify

In [None]:
!kubectl get pods

You can get the make-up of a pod, service, deployment etc. by using *kubectl get <type>* and output flag (-o yaml)

In [None]:
!kubectl get pod semantify-2835096483-g5d86 -o yaml

# Services
## What is a service?
A Kubernetes Service is an abstraction which defines a logical set of Pods and a policy by which to access them - sometimes called a micro-service. The set of Pods targeted by a Service is (usually) determined by a Label Selector.

### Important info contained in service
- The selector for pods - must match the metadata in the pod template
- Ports: The port your service exposes to outside components, and the target port on the pod
- Type of service:
    
    - ClusterIP: default; only has cluster-specific IP address; accessible only in cluster
    - NodePort: has Cluster IP, and opens a specific port on each node of cluster that you can talk to
    - LoadBalancer: exposes to the world, has an external IP address and exposed port

In [None]:
!cat kubernetes/testing/recommendation-engine/service.yml

In [None]:
!kubectl get services

# DNS

Services in kubernetes are registered with its internal DNS in the following format:

`<ServiceName>`.`<namespace>`.svc.cluster.local

eg: semantify.default.svc.cluster.local is the internal address of the semantify service in the Kubernetes cluster


In [None]:
cd kubernetes/testing

##### You can create a pod, deployment etc by using *kubectl create* with a file flag (-f)

In [None]:
!kubectl create -f recommendation-engine/deploy.yml

In [None]:
!kubectl get deployments
!kubectl get pods

##### Create multiple deployments and services at a time

In [None]:
!kubectl create -f recommendation-engine/deploy.yml
!kubectl create -f recommendation-engine/service.yml -f rabbitmq/deploy.yml -f rabbitmq/services.yml
!kubectl create -f redis/deploy.yml -f redis/service.yml
!kubectl create -f elasticsearch/client-deploy.yml -f elasticsearch/master-deploy.yml -f elasticsearch/data-deploy.yml
!kubectl create -f elasticsearch/services.yml
!kubectl create -f event-processor/deploy.yml
!kubectl create -f event-tracker/deploy.yml -f event-tracker/service.yml

In [None]:
!kubectl get services
!kubectl get deployments
!kubectl get pods

You can use your yml files to delete resources using the exact same format, replacing create with delete, as long as the metadata pertaining to the resource is the same.

In [None]:
!kubectl delete -f recommendation-engine/deploy.yml
!kubectl delete -f recommendation-engine/service.yml -f rabbitmq/deploy.yml -f rabbitmq/services.yml
!kubectl delete -f redis/deploy.yml -f redis/service.yml
!kubectl delete -f elasticsearch/client-deploy.yml -f elasticsearch/master-deploy.yml -f elasticsearch/data-deploy.yml
!kubectl delete -f elasticsearch/services.yml
!kubectl delete -f event-processor/deploy.yml
!kubectl delete -f event-tracker/deploy.yml -f event-tracker/service.yml

##### You can update a deployment just as easily, for purposes such as:
- Scaling number of pods
- Updating the container image

Let's change the Recommendation Engine to version:    0.21-c9c8528

In [None]:
!kubectl apply -f recommendation-engine/deploy.yml --record

##### The default deploy strategy is a *rolling deployment*.  The *readiness probe* can be used to tell when the new pod is ready to handle traffic

#### Rollout history is stored in kubernetes

In [None]:
!kubectl rollout history deployment/wisdom-recommendation-engine

##### You can roll back a deployment with a single line

In [None]:
!kubectl rollout undo deployment/wisdom-recommendation-engine

You can also use the revision flag (eg. --revision=6) to  specify the revision to roll back to.