# Kubernetes Service Discovery

In [1]:
maybe() {
    "$@" > .last_maybe 2>&1 || true
}

# Service Discovery

- for large scale deep learning we need multiple processes that talk to each other
- this requires
    - service discovery
    - networking
    - name resolution

# K8s Service Discovery

Simple:

- every pod gets assigned a hostname and domain
- you can simply connect directly to these well-known names

Requirements:

- create a "headless service" to start the name resolver
- add ports, host name, and subdomain to your pods

# Headless Service

The `clusterIP: None` makes it headless. (Other services are load balancing, which we don't want.)

In [2]:
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Service
metadata:
  name: bigdata19
spec:
  clusterIP: None
  ports:
    - port: 7880
      targetPort: 7880
  selector:
    app: bigdata19
EOF

service/bigdata19 created


# A Visible Pod

This pod will be assigned the DNS name `shards.bigdata19`.

In [3]:
# nodes get assigned DNS names if they have a port and the app label matches the headless service

maybe kubectl delete pod/shards
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: shards
  labels:
    app: bigdata19
spec:
  containers:
  - name: shards
    image: gcr.io/research-191823/bigdata19
    command: ["serve-imagenet-shards", "-b", "96", "zpub://0.0.0.0:7880"]
    ports:
      - containerPort: 7880
  restartPolicy: Never
  hostname: shards
  subdomain: bigdata19
EOF

pod/shards created


In [4]:
sleep 15

# DNS Debugging

In [9]:
kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
shards   1/1     Running   0          4m26s


In [10]:
# make sure resolution is working
kubectl exec -ti shards -- nslookup shards.bigdata19

# check resolv.conf file
kubectl exec -ti shards -- cat /etc/resolv.conf

Server:		10.64.0.10
Address:	10.64.0.10#53

Name:	shards.bigdata19.default.svc.cluster.local
Address: 10.0.2.3

nameserver 10.64.0.10
search default.svc.cluster.local svc.cluster.local cluster.local c.research-191823.internal google.internal
options ndots:5


In [11]:
# check service running
kubectl get svc --namespace=kube-system

# check endpoints
kubectl get ep kube-dns --namespace=kube-system

NAME                   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)         AGE
default-http-backend   NodePort    10.64.6.85     <none>        80:32565/TCP    59m
heapster               ClusterIP   10.64.10.206   <none>        80/TCP          59m
kube-dns               ClusterIP   10.64.0.10     <none>        53/UDP,53/TCP   59m
metrics-server         ClusterIP   10.64.7.249    <none>        443/TCP         59m
NAME       ENDPOINTS                                             AGE
kube-dns   10.0.2.134:53,10.0.2.2:53,10.0.2.134:53 + 1 more...   59m


In [12]:
# when desperate, you can look through the kube-dns logs
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name | sed 1q |
while read pod; do 
kubectl logs --tail=3 --namespace=kube-system $pod -c kubedns
kubectl logs --tail=3 --namespace=kube-system $pod -c dnsmasq
kubectl logs --tail=3 --namespace=kube-system $pod -c sidecar
kubectl logs --tail=3 --namespace=kube-system $pod -c prometheus-to-sd
done

NAME                        READY   STATUS    RESTARTS   AGE
kube-dns-79868f54c5-46q7h   4/4     Running   0          59m
kube-dns-79868f54c5-tf275   4/4     Running   0          59m
E1212 17:12:08.610570       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.64.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.64.0.1:443: i/o timeout
E1212 17:12:08.611507       1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.64.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.64.0.1:443: i/o timeout
I1212 17:57:33.512228       1 dns.go:601] Could not find endpoints for service "bigdata19" in namespace "default". DNS records will be created once endpoints show up.
I1212 17:03:17.281524       1 nanny.go:146] dnsmasq[24]: read /etc/hosts - 7 addresses
I1212 17:03:17.281450       1 nanny.go:149] 
W1212 17:03:17.281647       1 nanny.go:150] Got EOF from stdout
I1212 17:03:17.803027       1 server.go:45]

In [13]:
sleep 15

# Logs of the Running Server

The server is chugging along nicely, sending out training batches to anybody who will listen.

In [14]:
kubectl logs shards | sed 10q

serving zpub://0.0.0.0:7880
0 rate 0.000000 msg/s throughput 0.00e+00 bytes/s
10 rate 5.676849 msg/s throughput 8.20e+07 bytes/s
20 rate 5.370448 msg/s throughput 7.76e+07 bytes/s
30 rate 4.273988 msg/s throughput 6.18e+07 bytes/s
40 rate 4.440511 msg/s throughput 6.42e+07 bytes/s
50 rate 4.553441 msg/s throughput 6.58e+07 bytes/s
60 rate 4.636682 msg/s throughput 6.70e+07 bytes/s
70 rate 4.671525 msg/s throughput 6.75e+07 bytes/s
80 rate 4.725854 msg/s throughput 6.83e+07 bytes/s


In [15]:
kubectl get pods

NAME     READY   STATUS    RESTARTS   AGE
shards   1/1     Running   0          4m55s


# Starting a Client

Here is a small network client that listens to training data and outputs statistics.

In [17]:
maybe kubectl delete pod/client
kubectl apply -f - <<'EOF'
apiVersion: v1
kind: Pod
metadata:
  name: client
  labels:
    app: bigdata19
spec:
  containers:
  - name: client
    image: gcr.io/research-191823/bigdata19
    command: ["tensormon", "zsub://shards.bigdata19:7880"]
    stdin: true
    tty: true
  restartPolicy: Never
  hostname: client
  subdomain: bigdata19
EOF

pod/client created


In [18]:
sleep 15

# Client Output

In [19]:
kubectl logs client | sed 10q

input: ['zsub://shards.bigdata19:7880']
zsub://shards.bigdata19:7880
connected
                  10    5.431 batches/s  521.379 samples/s (batchsize: 96)
                  20    4.976 batches/s  477.705 samples/s (batchsize: 96)
                  30    4.789 batches/s  459.767 samples/s (batchsize: 96)
                  40    4.622 batches/s  443.675 samples/s (batchsize: 96)
                  50    3.835 batches/s  368.196 samples/s (batchsize: 96)
                  60    3.263 batches/s  313.282 samples/s (batchsize: 96)
                  70    4.048 batches/s  388.614 samples/s (batchsize: 96)


# Starting a DL Client on a GPU Node

In [20]:
maybe kubectl delete job/myjob
kubectl apply -f - <<'EOF'
apiVersion: batch/v1
kind: Job
metadata:
  name: myjob
  labels:
    app: bigdata19
spec:
  backoffLimit: 0
  template:
    spec:
      containers:
        - name: myjob
          image: gcr.io/research-191823/bigdata19
          command: 
            - "/bin/bash"
            - "-c"
            - |
              cp /files/*.py .
              python3 training.py --tensorcom zsub://shards.bigdata19:7880
          stdin: true
          tty: true
          resources:
            limits:
              nvidia.com/gpu: "1"
          volumeMounts:
            - mountPath: /files
              name: files
      nodeSelector:
        cloud.google.com/gke-accelerator: nvidia-tesla-t4
      restartPolicy: Never
      volumes:
        - configMap:
            name: files
          name: files
EOF

job.batch/myjob created


In [21]:
sleep 10

In [22]:
kubectl logs job/myjob

Thu Dec 12 18:21:30 UTC 2019; myjob-scs9r; root; /workspace; GPU 0: Tesla T4 (UUID: GPU-e3b63d8c-056b-140d-43e1-de274722818d); 
creating resnet50
        0 bs    96 per sample loss 7.38e-02 loading 5.98e-04 training 1.88e-02


In [23]:
sleep 60

# Training

- Note that with distributed preprocessing, loading is very fast.
- We will talk about the Tensorcom package late.

In [24]:
kubectl logs job/myjob

Thu Dec 12 18:21:30 UTC 2019; myjob-scs9r; root; /workspace; GPU 0: Tesla T4 (UUID: GPU-e3b63d8c-056b-140d-43e1-de274722818d); 
creating resnet50
        0 bs    96 per sample loss 7.38e-02 loading 5.98e-04 training 1.88e-02
     1152 bs    96 per sample loss 7.35e-02 loading 5.76e-04 training 7.51e-03
     2304 bs    96 per sample loss 7.32e-02 loading 5.66e-04 training 4.33e-03
     3360 bs    96 per sample loss 7.31e-02 loading 5.63e-04 training 3.50e-03
     4416 bs    96 per sample loss 7.28e-02 loading 5.65e-04 training 3.29e-03
     5472 bs    96 per sample loss 7.27e-02 loading 5.62e-04 training 3.23e-03
     6528 bs    96 per sample loss 7.27e-02 loading 5.67e-04 training 3.26e-03


In [25]:
kubectl get jobs

NAME    COMPLETIONS   DURATION   AGE
myjob   0/1           75s        75s


In [None]:
kubectl delete jobs --all
kubectl delete pods --all

job.batch "myjob" deleted
pod "client" deleted
pod "myjob-scs9r" deleted
pod "shards" deleted


# Kubernetes Service Discovery

- it's like creating new server out of thin air
- you can define your distributed application as a collection of pods
- K8s also provides load balancing and more complex name spaces