# Simple Demo of KubeVaspInteractive
This tutorial shows how to setup pod deployment with same FileSystem mounting to be used for KubeVaspInteractive

First, check if volume mounting is enabled in current pod

In [1]:
%%bash
kubectl get pod $HOSTNAME -o yaml | grep "mountPath" -2

    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /home/jovyan/shared-datasets/
      name: shared-datasets
    - mountPath: /home/jovyan/shared-scratch/
      name: shared-scratch
    - mountPath: /home/jovyan/data
      name: data
    - mountPath: /dev/shm
      name: dshm
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-editor-token-lqr8h
      readOnly: true


Let's use `/home/jovyan/data` as the shared volume mount between local and remote pods.

### Create pods with similar specs as local pod
`vasp_interactive.kubernetes` provides several helper functions to deploy a "similar pod"

In [2]:
from vasp_interactive.kubernetes import KubeVaspInteractive, create_kube_pods

`create_kube_pods` parses the current pod specs and generate a KubeCluster for scheduling and scaling (can also be achieved via native scalable deployment!)

It takes the resoures and waits for pods to be ready.

Let's first deploy 2 pods. Check the status of deployment at:

https://laikapack-controller.cheme.cmu.edu/p/c-qc7lr:p-cl5h6/workloads

In [3]:
cluster, worker_pods = create_kube_pods(scale=2, cpu=8, memory="4Gi")

Creating scheduler pod on cluster. This may take some time.


In [4]:
worker_pods

{0: {'name': 'dask-jovyan-9e842064-evwh69', 'namespace': 'alchem0x2a'},
 1: {'name': 'dask-jovyan-9e842064-elfs4l', 'namespace': 'alchem0x2a'}}

### Run isolated VASP process
`KubeVaspInteractive` just need to take the name and namespace for the pod to inject `kubectl exec` commands. 

In [5]:
%mkdir -p /home/jovyan/data/kube-vpi-test

In [6]:
%rm -rf /home/jovyan/data/kube-vpi-test/*

In [7]:
from ase.build import molecule
mol = molecule("CH4", vacuum=5, pbc=True)
vasp_params = dict(xc="pbe", kpts=1, encut=350, istart=0)
calc = KubeVaspInteractive(directory="/home/jovyan/data/kube-vpi-test", pod=worker_pods[0], **vasp_params)

`calc._args` are the arguments the calculator uses for communication. It essentially `kubectl exec` into the pod, change the directory and run VASP there

In [8]:
calc._args

['kubectl',
 'exec',
 '-i',
 'dask-jovyan-9e842064-evwh69',
 '--namespace=alchem0x2a',
 '--',
 'bash',
 '-c',
 'cd /home/jovyan/data/kube-vpi-test && $VASP_COMMAND']

Let's use classic mode to see where VASP is running

In [9]:
mol.calc = calc
mol.get_potential_energy()

-23.97373868

Now run `top` command in both local terminal and pod `dask-jovyan-c6f5a345-66c2ds`

One advantage of process isolation is killing the pods also releases any processes associated with them.

In [10]:
cluster.close()

In [11]:
# The communication is down
calc.process.poll()

Note currently after pod is deleted / stopped, you need to create the calculator again for further calculations.

### Simple pod synchronization
One advantage of process isolation is that running calculations in parallel is feasible. 
To do this, we need to use some sort of concurrency for the processes on local pod. One possibility is to use threading

In [12]:
from pathlib import Path
import time
mol1 = molecule("CH4", vacuum=5, pbc=True)
mol2 = mol1.copy()
mol2.rattle(stdev=0.1)
vasp_params = dict(xc="pbe", kpts=1, encut=350, istart=0)

root = Path("/home/jovyan/data/scratch")

In [13]:
cluster, worker_pods = create_kube_pods(scale=2, cpu=8, memory="2Gi")
image = cluster.workers[0]._pod.spec.containers[0].image

# calculation part
calc1 = KubeVaspInteractive(
        directory=root / "kube-vpi-test1", pod=worker_pods[0], **vasp_params
        )
calc2 = KubeVaspInteractive(
        directory=root / "kube-vpi-test2", pod=worker_pods[1], **vasp_params
        )

Creating scheduler pod on cluster. This may take some time.


Following part is running serial code

In [14]:
# Serial
with calc1, calc2:
    mol1.calc = calc1
    mol2.calc = calc2
    t_ = time.perf_counter()
    e1 = mol1.get_potential_energy()
    e2 = mol2.get_potential_energy()
    print("Serial mode:")
    print(e1, e2)
    print(f"Walltime for 2 sp calculations: {time.perf_counter() - t_}")

Serial mode:
-23.97373868 -22.82732759
Walltime for 2 sp calculations: 29.342208907939494


Let's now use threading to contain the processes. Note the function to be passed to threading must have mutable objects to store data.

In [15]:
from threading import Thread
def _thread_calculate(atoms, energy):
    """A threaded version of atoms.get_potential_energy. Energy is a one-member list
    ideas taken from https://wiki.fysik.dtu.dk/ase/_modules/ase/neb.html#NEB
    """
    energy[0] = atoms.get_potential_energy()
    return

# Pseudo-parallel
with calc1, calc2:
    mol1.calc = calc1
    mol2.calc = calc2

    # need to use mutable object to store energy
    e1 = [999]
    e2 = [999]
    threads = [
        Thread(target=_thread_calculate, args=(mol1, e1)),
        Thread(target=_thread_calculate, args=(mol2, e2)),
    ]
    t_ = time.perf_counter()
    for th in threads:
        th.start()
    for th in threads:
        th.join()
    print("Threaded mode:")
    print(e1[0], e2[0])
    print(f"Walltime for 2 sp calculations: {time.perf_counter() - t_}")

Threaded mode:
-23.97373868 -22.82732759
Walltime for 2 sp calculations: 40.177793267183006


As can be seen, the wall time reduced to almost half compared with the serial code, indicating the kubernetes isolation is scalable.

In [16]:
cluster.close()