# Simplifying Kubernetes with Templating

In [1]:
maybe() {
    "$@" > .last_maybe 2>&1 || true
}

# Simple Templating with Shell Scripts

We can get simple templates and shared parameters with shell scripts.

You may find this useful for small jobs.

In [15]:
# set up common variables

cat > variables <<'EOF'
export app=bigdata19
export subdomain=bigdata19
export image=gcr.io/research-191823/bigdata19
export shell=/bin/bash
EOF

source variables

In [16]:
# Create a job template file.
# Notice the use of environment variables inside the file.

cat > _template.yml <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: $name
  labels:
    app: $app
spec:
  containers:
  - name: mypod
    image: $image
    command: ["$shell", "-c", "$cmd"]
  restartPolicy: Never
EOF

In [17]:
# use `envsubst` to replace environment variables inside the file
# Note the use of name=value prior to the command to pass additional environment variables.

maybe kubectl delete pod/mypod
name=mypod cmd=uptime envsubst < _template.yml
name=mypod cmd=uptime envsubst < _template.yml | kubectl apply -f -

apiVersion: v1
kind: Pod
metadata:
  name: mypod
  labels:
    app: bigdata19
spec:
  containers:
  - name: mypod
    image: gcr.io/research-191823/bigdata19
    command: ["/bin/bash", "-c", "uptime"]
  restartPolicy: Never
pod/mypod created


In [18]:
sleep 15

In [19]:
kubectl logs pod/mypod

 18:27:34 up  1:26,  0 users,  load average: 0.15, 0.65, 0.80


In [20]:
kubectl delete pods --all

pod "mypod" deleted


# Simplifying Kubernetes with Templating

- K8s specs are complicated
- K8s specs for an app need to be consistent
- multiple solutions
    - Ansible - general software installation and configuration
    - Helm - configure and deploy K8s applications
    - Kubeflow - AI/ML framework and GUI on top of K8s
    
Want to stick close to plain K8s for control over performance, easy deployment.

# Templating

- put the boilerplate text into templates (Jinja2)
- generate actual YAML files by running a Jinja preprocessor
- `kubetpl` is a small Jinja processor with useful K8s templates

# Running a Job (the simple way)

In [21]:
maybe kubectl delete job.batch/mytask
kubetpl job -c uptime | kubectl apply -f -

job.batch/mytask created


In [22]:
sleep 15

In [23]:
kubectl get jobs

NAME     COMPLETIONS   DURATION   AGE
mytask   1/1           3s         24s


In [24]:
kubectl logs job/mytask

 18:29:24 up  1:28,  0 users,  load average: 0.02, 0.44, 0.71


In [25]:
kubectl delete job/mytask

job.batch "mytask" deleted


# Template Generation

What do these templates look like? Just look...

In [31]:
rm -f kubetpl.yaml

In [33]:
kubetpl pod -M 2G -c uptime

---
apiVersion: v1
kind: Pod
metadata:
  name: mytask
  labels:
    app: bragi-tmb-bigdata19
spec:
  containers:
  - name: mytask
    image: ubuntu:18.04
    resources:
      limits:
        memory: 2G
      requests:
        memory: 2G
    command: 
      - "/bin/bash"
      - "-c"
      - |
        uptime
    stdin: true
    tty: true
    env:
    ports:
      []
  hostname: mytask
  restartPolicy: Never


# Shared Parameters

Often, we start related jobs that need to share parameters. The `kubetpl.yaml` file contains these.

In [34]:
cat > kubetpl.yaml <<'EOF'
image: gcr.io/research-191823/bigdata19
memory: 4G
cpu: 1
app: bigdata19
subdomain: bigdata19
port:
  - 7880
config_map: files
env:
  - MASTER_ADDR=master.bigdata19
  - MASTER_PORT=7880
EOF

In [35]:
maybe kubectl delete service/bigdata19
kubetpl service
kubetpl service | kubectl apply -f -

apiVersion: v1
kind: Service
metadata:
  name: bigdata19
spec:
  clusterIP: None
  ports:
    - port: 7880
      targetPort: 7880
  selector:
    app: bigdata19
service/bigdata19 created


# Configmap Script

There is also a small script that simplifies creating configmaps.

In [36]:
kubefcm files *.py

-- --from-file=disttraining.py=disttraining.py
-- --from-file=helpers.py=helpers.py
-- --from-file=training.py=training.py
configmap "files" deleted
configmap/files created


# Server Example with Templates

In [38]:
maybe kubectl delete pod/shards
kubetpl pod -n shards -c 'serve-imagenet-shards -b 96 zpub://0.0.0.0:7880' | kubectl apply -f -

pod/shards created


In [39]:
sleep 15

In [40]:
kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
shards   1/1     Running   0          16s


In [41]:
kubectl logs shards | sed 10q

serving zpub://0.0.0.0:7880
0 rate 0.000000 msg/s throughput 0.00e+00 bytes/s
10 rate 5.751526 msg/s throughput 8.31e+07 bytes/s
20 rate 5.424528 msg/s throughput 7.84e+07 bytes/s
30 rate 5.313076 msg/s throughput 7.68e+07 bytes/s
40 rate 5.163820 msg/s throughput 7.46e+07 bytes/s
50 rate 5.092204 msg/s throughput 7.36e+07 bytes/s
60 rate 5.078865 msg/s throughput 7.34e+07 bytes/s


# Client with Templates

In [42]:
maybe kubectl delete pod/monitor
kubetpl pod -n monitor -c 'tensormon zsub://shards.bigdata19:7880' | kubectl apply -f -

pod/monitor created


In [43]:
sleep 15

In [44]:
kubectl get pods

NAME      READY   STATUS    RESTARTS   AGE
monitor   1/1     Running   0          16s
shards    1/1     Running   0          33s


In [45]:
kubectl logs monitor | sed 10q

input: ['zsub://shards.bigdata19:7880']
zsub://shards.bigdata19:7880
connected
                  10    5.260 batches/s  504.919 samples/s (batchsize: 96)
                  20    4.029 batches/s  386.829 samples/s (batchsize: 96)
                  30    4.998 batches/s  479.789 samples/s (batchsize: 96)
                  40    4.976 batches/s  477.707 samples/s (batchsize: 96)
                  50    4.664 batches/s  447.712 samples/s (batchsize: 96)
                  60    4.618 batches/s  443.371 samples/s (batchsize: 96)


# Training with Templates

In [46]:
maybe kubectl delete job/training
kubetpl job -n training -G 1 -M 8G -c '
cp /files/*.py .
python3 training.py --tensorcom zsub://shards.bigdata19:7880
' | kubectl apply -f -

job.batch/training created


In [47]:
sleep 10

In [50]:
kubectl logs job/training

Thu Dec 12 18:33:38 UTC 2019; training; root; /workspace; GPU 0: Tesla T4 (UUID: GPU-e3b63d8c-056b-140d-43e1-de274722818d); 
creating resnet50
        0 bs    96 per sample loss 7.38e-02 loading 7.33e-04 training 2.05e-02
      960 bs    96 per sample loss 7.36e-02 loading 9.20e-04 training 1.01e-02
     1920 bs    96 per sample loss 7.35e-02 loading 9.87e-04 training 6.43e-03


In [51]:
sleep 120

In [52]:
kubectl logs job/training

Thu Dec 12 18:33:38 UTC 2019; training; root; /workspace; GPU 0: Tesla T4 (UUID: GPU-e3b63d8c-056b-140d-43e1-de274722818d); 
creating resnet50
        0 bs    96 per sample loss 7.38e-02 loading 7.33e-04 training 2.05e-02
      960 bs    96 per sample loss 7.36e-02 loading 9.20e-04 training 1.01e-02
     1920 bs    96 per sample loss 7.35e-02 loading 9.87e-04 training 6.43e-03
     2880 bs    96 per sample loss 7.34e-02 loading 1.02e-03 training 5.21e-03
     3840 bs    96 per sample loss 7.30e-02 loading 1.14e-03 training 4.64e-03
     4800 bs    96 per sample loss 7.30e-02 loading 1.18e-03 training 4.34e-03
     5664 bs    96 per sample loss 7.27e-02 loading 1.11e-03 training 4.36e-03
     6528 bs    96 per sample loss 7.27e-02 loading 1.27e-03 training 4.48e-03
     7392 bs    96 per sample loss 7.26e-02 loading 9.36e-04 training 4.85e-03
     8256 bs    96 per sample loss 7.24e-02 loading 1.07e-03 training 4.63e-03
     9120 bs    96 per sample loss 7.23e-02 loading 1.30e-03 traini

In [53]:
kubectl get jobs

NAME       COMPLETIONS   DURATION   AGE
training   1/1           2m8s       2m30s


In [None]:
kubectl delete jobs --all
kubectl delete pods --all

# Kubernetes with Templating

Makes using Kubernetes as simple as many job queuing systems:

- start service/server: `kubetpl pod -c ... | kubectl apply -f`
- submit job: `kubetpl job -c ... | kubectl apply -f`
- create service: `kubetpl service ... | kubectl apply -f`
- share files: `kubecfm name files...`