## Experiment 4: Serving Models on KuberNetes

As you may notice, different service may need different resources (e.g., GPU for LLM service) and dependencies. In this experiement, you are required to deploy your applications on our cluster. You should implement Dockerfiles that build images that serve your applications.

Additional requirement: serve multiple LLMs simultaneously and support switching background models at the UI.

## Hints

Here we provide a simple example dockerfile for your reference.

In [None]:
%%file Dockerfile.fastapi_simple

# Use an NVIDIA PyTorch image, here xx.xx should be your docker version
# For example, if `docker --version` outputs 20.10.03, xx.xx should be 20.10
# FROM nvcr.io/nvidia/pytorch:xx.xx-py3
FROM nvcr.io/nvidia/pytorch:20.10-py3

# Change the source of pip to Tsinghua Tuna, and install
# Hugging Face dependencies. Add your dependencies here.
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && pip install uvicorn fastapi --use-feature=2020-resolver 

# Following you should put the logic of running your service
WORKDIR /app
COPY ./fastapi_service_simple.py /app
CMD ["uvicorn", "fastapi_service_simple:app", "--host", "0.0.0.0", "--port", "8000"]


In [None]:
%%file fastapi_service_simple.py

import fastapi


app = fastapi.FastAPI()


@app.get('/inference')
def process_string(data: str):
    return f'Processed {data} by FastAPI!'


In [None]:
%%file Dockerfile.gradio_simple
# Use an official Python runtime as a parent image with Python 3.11
FROM python:3.11

# Change the source of pip to Tsinghua Tuna, and install
# Hugging Face dependencies. Add your dependencies here.
RUN pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple && pip install gradio

WORKDIR /app
COPY ./gradio_service_simple.py /app
CMD ["python", "gradio_service_simple.py"]


In [None]:
%%file gradio_service_simple.py
import requests
import gradio as gr

def greet(name):
    return requests.get(f'http://fastapi-service:8000/inference?data={name}').text

demo = gr.Interface(fn=greet, inputs="text", outputs="text")
    
if __name__ == "__main__":
    demo.launch(server_name='0.0.0.0') 

You can build images using the following commands.
```bash
# Build the images
docker build -t my-fastapi-app -f Dockerfile.fastapi_simple .
docker build -t my-gradio-app -f Dockerfile.gradio_simple .
```
Push the image to the cluster:
```bash
docker login [hub_addr]
docker tag my-fastapi-app [hub_addr]/[your_project]/my-fastapi-app
docker tag my-gradio-app [hub_addr]/[your_project]/my-gradio-app
docker push [hub_addr]/[your_project]/my-fastapi-app
docker push [hub_addr]/[your_project]/my-gradio-app
```
`[hub_addr]` is the address of the cluster harbor.

You may use the following config files to deploy your applications. Another hint: you can expose your service using `kubectl port-forward` like this:

```bash
kubectl port-forward pod/[my-pod] [local_port]:[pod_port]
```

The following are some additional examples for using k8s.

In [None]:
%%file fastapi-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: fastapi-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: fastapi
  template:
    metadata:
      labels:
        app: fastapi
    spec:
      containers:
      - name: fastapi
        image: my-fastapi-app
        ports:
        - containerPort: 8000
        resources:
          limits:
            nvidia.com/gpu: 1

In [None]:
%%file fastapi-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: fastapi-service
spec:
  selector:
    app: fastapi
  ports:
    - protocol: TCP
      port: 8000
      targetPort: 8000

In [None]:
%%file gradio-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gradio-deployment
spec:
  replicas: 1
  selector:
    matchLabels:
      app: gradio
  template:
    metadata:
      labels:
        app: gradio
    spec:
      containers:
      - name: gradio
        image: my-gradio-app
        ports:
        - containerPort: 7860


In [None]:
%%file gradio-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: gradio-service
spec:
  type: LoadBalancer
  selector:
    app: gradio
  ports:
    - protocol: TCP
      port: 7860
      targetPort: 7860