## Prerequisites

You will need a kubernetes cluster available to run the workflow in.

A kubeconfig file for that cluster should be uploaded to the to this jupyterlab instance
and moved/renamed to the file `~/.kube/config` so that it will be used automatically.

The kubeconfig file should be the config for the cluster-admin, or a config for an RBAC
account for a specific namespace can be used (if argo/redis are already configured in the cluster).

## Cluster setup

`kubectl` is installed in this notebook, so as long as there is a valid kubeconfig file in the
right location the commands below can run without any modifications.

### Install argo in the cluster

Argo is a workflow engine that will be processing our workflow file and orchestrating the creation of pods in the cluster.

In [None]:
!kubectl create namespace argo
!kubectl apply -n argo -f https://raw.githubusercontent.com/argoproj/argo/v2.4.0/manifests/install.yaml

For argo to work, it also needs [extra permissions](https://github.com/argoproj/argo/blob/dc54919/docs/workflow-rbac.md) for the service account that the workflow pods will be using (to monitor their own status).

This will add the required permissions to the default service account in the default namespace:

In [None]:
!kubectl apply -f workflow-role.yaml

For running in another namespace, the references to `namespace: default` in workflow-role.yaml should be changed.

The argo client is already installed in this notebook, to install locally check the [releases page](https://github.com/argoproj/argo/releases).

### Install redis

For the live visualisation used in this analysis notebook, data is pushed from the workflow pods to a redis database and pulled into the notebook.

The redis instance can run in the same cluster as the workflow, but because it will also need to be accessed from an external location (the notebook)
there are some extra steps to take.

The kubernetes service for the redis instance will be of type `NodePort` so that it can be accessed by the IP address of one of the cluster nodes.

In [None]:
!kubectl create ns redis
!kubectl apply -f redis/redis.yaml -f redis/redis-svc.yaml

REDIS_HOST = !kubectl -n redis get nodes -l 'node-role.kubernetes.io/master notin ()' -o jsonpath='{.items[0].status.addresses[0].address}'
REDIS_PORT = !kubectl -n redis get svc redis -o jsonpath='{.spec.ports[0].nodePort}'

!kubectl create secret generic redis-connection --from-literal=REDIS_HOST={REDIS_HOST[0]} --from-literal=REDIS_PORT={REDIS_PORT[0]}

The workflow pods will be able to access the redis instance at `redis.redis.svc:6379`, and from the notebook redis will be accessible using the node IP and the service nodeport which are saved in the secret redis-connection.

### Configure S3 storage

Argo uses S3 for intermediate storage between steps in the workflow, and for the storing the final results.

These commands should be run outside of the notebook.

```bash
$ S3_HOST=$(openstack catalog show s3 -f value -c endpoints | grep public | cut -d '/' -f3)
$ ACCESS_KEY=$(openstack ec2 credentials create -f value -c access)
$ SECRET_KEY=$(openstack ec2 credentials show $ACCESS_KEY -f value -c secret)

$ kubectl create secret generic s3-cred --from-literal=accessKey=$ACCESS_KEY --from-literal=secretKey=$SECRET_KEY

$ echo $S3_HOST
s3.cern.ch
```

The `S3_HOST` value goes into the workflow yaml as a parameter.

A bucket for artifact storage should be created in the S3 storage and its name
should also be given as a parameter in the workflow. The bucket name has to be unique
so this **must** be changed from the default value.

## Start the workflow

Everything is ready for the workflow to be run, head over to `masters-notebook.ipynb`.