Skip to content

Example DRA driver that developers can fork and modify to get them started writing their own.

License

Notifications You must be signed in to change notification settings

nasim-samimi/dra-rt-driver

 
 

Repository files navigation

RT Resource Driver for Dynamic Resource Allocation (RT-DRA)

This repository contains the resource driver for deploying containers using sched_deadline policy in real-time linux kernel for use with the Dynamic Resource Allocation (DRA) feature of Kubernetes.

Quickstart

Before diving into the details of how this example driver is constructed, it's useful to run through a quick demo of it in action.

Prerequisites

Install Kubernetes

To make sure that RT-DRA can be recognised by the Kubernetes and perform correctly, we must install RT-containerd and RT-runc as container runtimes and enable the DRA feature when initiating the Kubernetes cluster.

For installing the Kubernetes, we follow the steps for a normal installation from here. However, we install a custom container runtime (RT-containerd and RT-runc).

To install the RT-containerd, we must clone it's repository, compile, and install it:

git clone -b rt https://github.com/nasm-samimi/containerd.git
cd containerd
make
sudo make install

create the config file for

containerd config default > /etc/containerd/config.toml 

containerd requires CNI plugins which can be installed as explained here.

To install the RT-runc, we must clone it's repository, compile, and install it:

sudo apt install libseccomp-dev
git clone -b rt https://github.com/nasm-samimi/runc.git
cd runc
make
sudo install -D -m0755 runc /usr/local/sbin/runc

We prepared a configuration file that enables the DRA feature at cluster initiation. To use the configuation file, we run:

sudo kubeadn init --config=kubeadm-config.yaml

After installing CNI plugin, run the following commands:

sudo systemctl daemon-reload
sudo systemctl restart containerd
sudo systemctl restart kubelet

To join the worker nodes, first we run get the token from the master node by running the following command on the master node:

kubeadm token create --print-join-command

After receiving the token and hash code from the previous command, replae the toke and hash fields in the worker-config.yaml. Then to join the worker node run the following command from the worker node:

sudo kubeadm join --config=worker-config.yaml

Demo

We start by first cloning this repository and cding into its demo subdirectory. All of the scripts and example Pod specs used in this demo are contained here, so take a moment to browse through the various files and see what's available:

git clone https://github.com/nasm-samimi/dra-rt-driver.git
cd dra-rt-driver/demo

coming up as expected:

$ kubectl get pod -A

And then install the RT-DRA via helm:

helm upgrade -i \
  --create-namespace \
  --namespace dra-rt-driver \
  dra-rt-driver \
  deployments/helm/dra-rt-driver

Double check the driver components have come up successfully:

$ kubectl get pod -n dra-rt-driver

And show the initial state of available GPU devices on the worker node:

$ kubectl describe -n dra-rt-driver nas/dra-example-driver-cluster-worker
...
Spec:
  Allocatable Cpuset:
    Rtcpu:
      Id:    2
      Util:  0
    Rtcpu:
      Id:    3
      Util:  0
    Rtcpu:
      Id:    4
      Util:  0
    Rtcpu:
      Id:    0
      Util:  0
    Rtcpu:
      Id:    1
      Util:  0
...

Next, deploy four example apps that demonstrate how ResourceClaims, ResourceClaimTemplates, and custom ClaimParameter objects can be used to request access to resources in various ways:

kubectl create -f rt-test{1,2,3,4}.yaml

And verify that they are coming up successfully:

$ kubectl get pod -A
...

Use your favorite editor to look through each of the gpu-test{1,2,3,4}.yaml files and see what they are doing. The semantics of each match the figure below:

Demo Apps Figure

Then dump the logs of each app to verify that CPUs were allocated to them according to these semantics:

This should produce output similar to the following:

Likewise, looking at the ClaimAllocations section of the NodeAllocationState object on the worker node will show which GPUs have been allocated to a given ResourceClaim by the resource driver:

$ kubectl describe -n dra-rt-driver nas/dra-rt-driver-cluster-worker
...
Spec:
  ...
  Prepared Claims:

Once you have verified everything is running correctly, delete all of the example apps:

kubectl delete --wait=false --f rt-test{1,2,3,4}.yaml

Wait for them to terminate:

$ kubectl get pod -A

...

And show that the ClaimAllocations section of the NodeAllocationState object on the worker node is now back to its initial state:

$ kubectl describe -n dra-rt-driver nas/dra-example-driver-cluster-worker
...
Spec:

Anatomy of a DRA resource driver

TBD

References

For more information on the DRA Kubernetes feature and developing custom resource drivers, see the following resources:

Building the code

We start by first cloning this repository and cding into its demo subdirectory:

git clone https://github.com/nasim-samimi/dra-rt-driver.git
cd dra-rt-driver/demo

We build the image for the example resource driver:

./build-driver.sh

About

Example DRA driver that developers can fork and modify to get them started writing their own.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Go 89.8%
  • Makefile 6.5%
  • Smarty 2.5%
  • Dockerfile 1.2%