# Deploy Machine Learning Service on Truefoundry
This notebook demonstrates a demo on how you can deploy an image classification model trained on mnist dataset as a Gradio App on truefoundry platform.

## Prerequisites

Before we begin, make sure you have the following prerequisites in place:

1. **Install `servicefoundry`** (Note: `servicefoundry` is pre-installed in Truefoundry notebooks). You can install it using the following command:

In [None]:
!pip install -U "servicefoundry"

2. **Login to servicefoundry**

Enter your host in the `--host` argument, eg: "https://your-domain.truefoundry.com"

In [None]:
!sfy login --host "<ENTER YOUR HOST HERE>"

3. **Select the `Workspace`** in which you want to deploy your application. <br>Once you run the cell below you will get a prompt to enter your workspace. <br>
    * **Step 1:** Navigate to the **Workspace** tab on the left panel of your User Interface.
    * **Step 2:** Identify the Workspace you want to deploy the application in.
    * **Step 3:** Copy the Workspace FQN <br>
    ![Copying Workspace FQN](https://files.readme.io/730fee2-Screenshot_2023-02-28_at_2.08.34_PM.png)
    * **Step 4:** Paste the  Workspace FQN in the prompt and press enter.

In [None]:
workspace_fqn = input("Enter your Workspace FQN: ")

4. **Setup Logging**

In [None]:
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

## Clone the Getting Started Repo

In this step, we will clone the Truefoundry Getting Started repository. This repository contains the service code that we are going to deploy.

In [None]:
!git clone https://github.com/truefoundry/getting-started-examples.git

Now let's `cd` into the directory containing our inference code, i.e `getting-started-examples/mnist-service`

In [None]:
%cd getting-started-examples/mnist-service

### Code Structure
Before we proceed, let's take a quick look at the structure of the code you'll be deploying:

```
.
|_ app.py: Contains the Gradio Service code used to serve your model.
|_ requirements.txt: Dependency file.
|_ gen_example_images.py: Code to generate example images that you can use to test your Gradio service.
|_ train.py: Contains the Training code used to train the model we are deploying.
```

The `app.py` file houses the code that enables you to deploy and interact with your trained model using Gradio. Here's a brief overview of its components:

* **Prediction Function:**
    * The predict function is responsible for making predictions using the trained model.
    * Preprocess the input image and prepare it for model inference.
    * Load the trained model and perform predictions to identify the digit in the image.
* **Gradio Interface Setup:**
    * Create a Gradio interface that takes an input image and produces a predicted label.
    * Provide examples of images ("0.jpg" and "1.jpg") for users to easily test the interface.

```python
import gradio as gr  
import tensorflow as tf  
import numpy as np  
from PIL import Image

def predict(img_arr):
	# Preprocess the image before passing it to the model
  img_arr = tf.expand_dims(img_arr, 0)  
  img_arr = img_arr[:, :, :, 0]  # Keep only the first channel (grayscale)

	# Load the trained model
  loaded_model = tf.keras.models.load_model('mnist_model.h5')

	# Make predictions
  predictions = loaded_model.predict(img_arr)  
  predicted_label = tf.argmax(predictions[0]).numpy()

  return str(predicted_label)

# Setup the gradio interface
gr.Interface(fn=predict,  
             inputs="image",  
             outputs="label",  
             examples=[["0.jpg"], ["1.jpg"]]  
).launch(server_name="0.0.0.0", server_port=8080)
```




## Deploying Your Machine Learning Service

Now, let's move on to the deployment steps.

### Step 1: Set Up Deployment Configuration
In this step, you will define your deployment configuration using the ServiceFoundry CLI. We will provide explanations for each configuration, and afterwards we will bring all of this together into a `servicefoundry.yaml` file.<br>

For deploying via ServiceFoundry CLI you need to create a `servicefoundry.yaml` file with a YAML configuration of your Deployment

#### Name
In the provided Python script, set a unique identifier for your service using the name field.

```yaml
name: mnist-service
```

#### Image

* Choosing the Right Approach for specifying image:
    Depending on your scenario, you can choose to deploy either a pre-built Docker image or build a Docker image from your source code.
    
* Using Pre-Built Images
    If you already have a Docker image that you've previously built and pushed to a container registry, you can use `type: image`.
    The `type: image` would simply reference the pre-built image URL and use it for deployment.
* Using Build for Source Code
    In cases where you don't have a pre-built image, you'll use the Build option to create an image from your source code.
    This scenario applies when you want to package and deploy your application from scratch.
    * Creating DockerFile with PythonBuild
        If you don't have a Dockerfile but your application is written in Python, you can use the `type: tfy-python-buildpack`.
        The PythonBuild class will inspect your Python code and create a Dockerfile automatically based on the code's requirements.
    * Choosing DockerBuild for Dockerfile
        If you have a pre-existing Dockerfile, you can use the `type: dockerfile`.
        This allows you to directly reference the Dockerfile present in your code repository.

In this case given we did not have a prebuilt image, and no dockerfile in our source code we are using `tfy-python-buildpack`, which takes our code configuration from us and templatizes a Dockerfile for us.

```yaml
image:
  type: build
  build_spec:
    type: tfy-python-buildpack
    command: python app.py
    python_version: '3.9'
    requirements_path: requirements.txt
    build_context_path: ./
  build_source:
    type: local
```

#### Ports
* Specify the **Port** for routing customer traffic to your deployed application using the Port option.
* Specify the **Host** to define how the external world will access your application.


> 📘 Picking a value for `host`
> 
> Providing a host value depends on the base domain urls configured in the cluster settings, you can learn how to find the base domain urls available to you [here](doc:checking-configured-domain)
> 
> For e.g. If your base domain url is `*.truefoundry.your-org.com` then a valid value can be `fastapi-your-workspace-8000.truefoundry.your-org.com`. 
> 
> Alternatively if you have a non wildcard based domain url e.g. `truefoundry.your-org.com`, then a valid value can be `truefoundry.your-org.com/fastapi-your-workspace-8000`

```yaml
ports:
  - host: <ENTER YOUR HOST HERE>
    port: 8080
```

#### Resources
Allocate computing resources (CPU, memory, storage) for your service using the Resources option.<br>
* **CPU** refers to the computing power available to your application
* **Memory** refers to how much space your application has to hold and work with data while it's running
* **Ephemeral storage** is where your application can temporarily store files and data

Requests and Limits:

* **Request** is like asking for a certain amount of a resource. It's what your application initially asks for to start working properly.
* **Limit** is like setting a maximum value. It restricts how much of a resource (like CPU or memory) your application can use.

So for each category of resource you specify the Request and Limits

```yaml
resources:
  cpu_limit: 0.3
  gpu_count: 0
  cpu_request: 0.3
  memory_limit: 500
  memory_request: 500
  ephemeral_storage_limit: 600
  ephemeral_storage_request: 600
```


### Step 2: Bring all of the configuration together via the Service Class and Deploy

To deploy your machine learning service, you need to create a `servicefoundry.yaml` file with the configuration library. This will encapsulate all the necessary configurations and parameters for deploying and managing your service.

In [None]:
%%writefile servicefoundry.yaml
name: mnist-service
type: service
image:
  type: build
  build_spec:
    type: tfy-python-buildpack
    command: python app.py
    python_version: '3.9'
    requirements_path: requirements.txt
    build_context_path: ./
  build_source:
    type: local
ports:
  - host: <YOUR HOST HERE>
    port: 8080
replicas: 1
resources:
  cpu_limit: 0.3
  gpu_count: 0
  cpu_request: 0.3
  memory_limit: 500
  memory_request: 500
  ephemeral_storage_limit: 600
  ephemeral_storage_request: 600

After configuring your deployment settings, you can deploy the service using the deploy method. Here we are replacing the WORKSPACE_FQN with the workspace_fqn we stored earlier.

In [None]:
# Deploy the service
!servicefoundry deploy --workspace-fqn=YOUR_WORKSPACE_FQN

Once the build is complete, you will see a link to the dashboard after a message like `You can find the application on the dashboard:-`. <br>Click on the link to access the deployment dashboard.

# Interacting with the Application

Clicking the link will open up the dashboard dedicated to your service, where you'll have access to various details.

Here you will be able to see the Endpoint of your service at the top right corner. You can click on the Endpoint to open your application.

![](https://files.readme.io/142331b-8e96f01-Screenshot_2023-06-30_at_1.54.29_PM.png)

Now you can click on one of the Images from the two options and see what predictions your model gives:

![](https://files.readme.io/d2f8d05-bba9cc1-Screenshot_2023-06-30_at_1.57.15_PM.png)

Congratulations! You have successfully deployed the model using Truefoundry.

# Enabling Autoscaling for your Service

In this section, we'll explore enabling autoscaling for your service, a feature that allows your application to dynamically adjust its resources based on real-time demand and predefined metrics. Autoscaling optimizes performance, responsiveness, and resource efficiency.

## Scaling with Replicas and Pods

In Kubernetes and containerized environments, replicas and pods are essential concepts for managing application availability and scalability.

When deploying applications in a Kubernetes cluster, you specify the number of replicas for your service. Each replica is an identical instance of your application within a pod. A pod is the smallest deployable unit in Kubernetes, comprising one or more closely connected containers that share network and storage.

By setting the replica count, you control how many pod instances run concurrently, directly affecting your application's traffic handling capacity.

### Handling Demand

More replicas allocate more pods to manage incoming traffic, distributing the workload and improving responsiveness during spikes. Scaling by adjusting replicas aligns your application's capacity with varying traffic.

### Setting the number of replicas

To configure the number of replicas your service should have via the ServiceFoundry CLI, you can add the following to your service deployment configuration:

```yaml
...
replicas: 1
...
```

Next, we'll explore how autoscaling improves performance by dynamically adjusting replicas based on real-time metrics and demand.

## Autoscaling Overview

Autoscaling involves dynamically adjusting computing resources based on real-time demand and predefined metrics. This optimization ensures that your service efficiently utilizes resources while responding to varying traffic loads.

### Autoscaling Configuration

Autoscaling configuration involves setting minimum and maximum replica counts as well as defining metrics that trigger autoscaling actions. Here are the available settings for autoscaling:

- **Minimum Replicas:** The minimum number of replicas to keep available.
- **Maximum Replicas:** The maximum number of replicas to keep available.
- **Cooldown Period:** The period to wait after the last active trigger before scaling resources back to 0.

### Configuring Autoscaling via UI

To configure autoscaling parameters for your service via the ServiceFoundry CLI, you can add the following to your service deployment configuration:

```yaml
replicas:
  min_replicas: 1
  max_replicas: 3
  cooldown_period: 300
...
```

### Autoscaling Metrics

Autoscaling metrics guide the system in dynamically adjusting resource allocation based on real-time conditions. They ensure your service can adapt to changing demands while maintaining optimal performance. We support the following three types of autoscaling metrics:

1. **RPSMetric (Requests Per Second Metric):** Monitors the rate of incoming requests measured in requests per second. Suitable for applications with varying request loads over time.

2. **CPUUtilizationMetric (CPU Utilization Metric):** Monitors the percentage of CPU resources in use. Ideal for applications where performance correlates with CPU usage.

3. **TimeRange:** Allows scheduling autoscaling actions based on specific time periods. Useful for applications with predictable traffic patterns.

# Additional Capabilities of Services

Let's explore additional functionalities that Services provide, extending beyond deployment strategies:

- **Rollouts Management:** Maintain precise control over how new versions of your application are released to users, ensuring seamless transitions and minimal disruption.
- **Endpoint Authentication:** Bolster the security of your endpoints by integrating authentication mechanisms, effectively limiting access and safeguarding sensitive data.
- **Health Check Monitoring:** Monitor your services' health through comprehensive health checks, guaranteeing their operational readiness to handle incoming requests.
- **Efficient Communication with gRPC:** Leverage the power of gRPC, a high-performance communication protocol, to establish efficient and reliable connections between microservices.
- **TensorFlow Serving with gRPC:** Harness the capabilities of TensorFlow Serving in conjunction with gRPC to facilitate machine learning model deployment and communication.
- **Intercept Management:** Implement interceptors to exert fine-grained control over network communication, enhancing security measures and facilitating robust logging.
- **Scaling Deep Dive:** Gain in-depth insights into the nuances of scaling your services, optimizing resource allocation strategies to seamlessly adapt to varying demands