Kind GPU Worker Autoscaler Example

This example shows how to scale a GPU worker deployment to zero when there are no pending jobs and scale it back up when there are pending jobs.

Prerequisites

Create a Kind cluster with GPU support
```
make create-cluster
```
Build the docker images
```
make build-images
```
Load the docker images into the Kind cluster
```
make load-images
```
Install NVIDIA GPU operator in the cluster
```
make install-nvidia-operator
```
Deploy the api, worker and autoscaler
```
make deploy-all
```
Port forward the api to localhost
```
make port-forward-api
```
Submit a job to the api on localhost:9091. Watch the pods and logs to see the autoscaler in action.
Delete the cluster when you're done
```
make delete-cluster
```

The api is a simple fastapi app that queues a matrix length into either the cpu or gpu redis queue.
cpu-worker listens to the cpu queue and multiplies two random matrices of dimension length x length together on the CPU.
Similarly gpu-worker listens to the gpu queue and multiplies two random matrices of dimension length x length together on the GPU.
The autoscaler is a simple python script that uses the Kubernetes API to scale the gpu-worker deployment to zero when there are no pending jobs in the gpu redis queue and scales gpu-worker back up when there are jobs available again in the gpu redis queue.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
k8s		k8s
services		services
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
cluster.yaml		cluster.yaml
docker-compose.yaml		docker-compose.yaml