This example shows how to scale a GPU worker deployment to zero when there are no pending jobs and scale it back up when there are pending jobs.
- Kind fork with GPU support
- Make sure you have a working
kubectl
installation - Make sure the host machine has a GPU and the NVIDIA Container Toolkit installed
- There might be some other things I've forgotten
- Create a Kind cluster with GPU support
make create-cluster
- Build the docker images
make build-images
- Load the docker images into the Kind cluster
make load-images
- Install NVIDIA GPU operator in the cluster
make install-nvidia-operator
- Deploy the api, worker and autoscaler
make deploy-all
- Port forward the api to localhost
make port-forward-api
- Submit a job to the api on localhost:9091. Watch the pods and logs to see the autoscaler in action.
- Delete the cluster when you're done
make delete-cluster
- The api is a simple fastapi app that queues a matrix length into either the
cpu
orgpu
redis queue. cpu-worker
listens to thecpu
queue and multiplies two random matrices of dimensionlength x length
together on the CPU.- Similarly
gpu-worker
listens to thegpu
queue and multiplies two random matrices of dimensionlength x length
together on the GPU. - The
autoscaler
is a simple python script that uses the Kubernetes API to scale thegpu-worker
deployment to zero when there are no pending jobs in thegpu
redis queue and scalesgpu-worker
back up when there are jobs available again in thegpu
redis queue.