# Deploy Machine Learning Job on Truefoundry
This notebook demonstrates a demo on how you can deploy a classification model trained on customer churn dataset as well as log the job metadata on truefoundry platform.

## Prerequisites

Before we begin, make sure you have the following prerequisites in place:

1. **Install `servicefoundry`** (Note: `servicefoundry` is pre-installed in Truefoundry notebooks). You can install it using the following command:

In [None]:
!pip install -U "servicefoundry"

2. **Login to servicefoundry**

Enter your host in the `--host` argument, eg: "https://your-domain.truefoundry.com"

In [None]:
!sfy login --host "<ENTER YOUR HOST HERE>"

3. **Select the `Workspace`** in which you want to deploy your application. <br>Once you run the cell below you will get a prompt to enter your workspace. <br>
    * **Step 1:** Navigate to the **Workspace** tab on the left panel of your User Interface.
    * **Step 2:** Identify the Workspace you want to deploy the application in.
    * **Step 3:** Copy the Workspace FQN <br>
    ![Copying Workspace FQN](https://files.readme.io/730fee2-Screenshot_2023-02-28_at_2.08.34_PM.png)
    * **Step 4:** Paste the  Workspace FQN in the prompt and press enter.

In [None]:
workspace_fqn = input("Enter your Workspace FQN: ")

4. **Setup Logging**

In [None]:
import logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

## Clone the Getting Started Repo

In this step, we will clone the Truefoundry Getting Started repository. This repository contains the job code that we are going to deploy.

In [None]:
!git clone https://github.com/truefoundry/getting-started-examples.git

Now let's `cd` into the directory containing our inference code, i.e `getting-started-examples/customer-churn`

In [None]:
%cd getting-started-examples/customer-churn

## Code Structure

Before we proceed, let's take a quick look at the structure of the code you'll be deploying:

```text
.
|_ main.py : Contains the training code
|_ requirements.txt : Dependency file
```

Let's help you understand the key elements in the main.py code that you'll be deploying:

- **Hyperparameters and argparse:**  
  Firstly, the argparse library is used to handle hyperparameters as command-line arguments. This dynamic approach allows altering hyperparameters without modifying the code itself. These command-line hyperparameters are then passed to the train_model function.
- **`train_model` Function:**  
  The train_model function is responsible for training the K-Nearest Neighbors (KNN) classifier using the provided hyperparameters. It also calculates the metrics for evaluating the model. Then it passes all of this info to `experiment_track` function
- **`experiment_track` Function:**  
  The experiment_track function logs experiment-related details into the ML Repo. Specifically:
  - It Initializes the mlfoundry client.
  - Creates an ML Repo named "churn-pred."
  - Creates a run within the ML Repo to track this experiment.
  - Logs hyperparameters and metrics.
  - Logs the trained model using the log_model method, enabling deployment via Model Deployment.

```python main.py
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier as Classification
import mlfoundry as mlf
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt

def experiment_track(model, params, metrics):
    # initialize the mlfoundry client.
    mlf_api = mlf.get_client()

    # create a ml repo
    mlf_api.create_ml_repo("churn-pred")
    # create a run
    mlf_run = mlf_api.create_run(
        ml_repo="churn-pred", run_name="churn-train-job"
    )
    # log the hyperparameters
    mlf_run.log_params(params)
    # log the metrics
    mlf_run.log_metrics(metrics)
    # log the model
    model_version = mlf_run.log_model(
        name="churn-model",
        model=model,
        # specify the framework used (in this case sklearn)
        framework=mlf.ModelFramework.SKLEARN,
        description="churn-prediction-model",
    )
    # return the model's fqn
    return model_version.fqn


def train_model(hyperparams):

    df = pd.read_csv("https://raw.githubusercontent.com/nikp1172/datasets-sample/main/Churn_Modelling.csv")
    X = df.iloc[:, 3:-1].drop(["Geography", "Gender"], axis=1)
    y = df.iloc[:, -1]
    # Create train test split
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
    
    # Initialize the KNN Classifier
    classifier = Classification(
        n_neighbors=hyperparams['n_neighbors'],
        weights=hyperparams['weights'],
    )

    # Fit the classifier with the training data
    classifier.fit(X_train, y_train)
    y_pred = classifier.predict(X_test)

    # Get the metrics
    metrics = {
        "accuracy": accuracy_score(y_test, y_pred),
        "f1_score": f1_score(y_test, y_pred, average="weighted"),
        "precision": precision_score(y_test, y_pred, average="weighted"),
        "recall": recall_score(y_test, y_pred, average="weighted"),
    }

    # Log the experiment
    experiment_track(classifier, classifier.get_params(), metrics)


if __name__ == "__main__":
    import argparse

    # Setup the argument parser by instantiating `ArgumentParser` class
    parser = argparse.ArgumentParser()
    # Add the hyperparameters as arguments
    parser.add_argument(
        "--n_neighbors",
        type=int,
        required=True,
    )
    parser.add_argument(
        "--weights",
        type=str,
        required=True,
    )
    args = parser.parse_args()
    hyperparams = vars(args)

    # Train the model
    train_model(hyperparams)
```



## Deploying Your Machine Learning Job

Now, let's move on to the deployment steps.

### Step 1: Set Up Deployment Configuration
In this step, you will define your deployment configuration using the Truefoundry Python SDK. We will provide explanations for each parameter and guide you through the process.

#### Name
In the provided Python script, set a unique identifier for your job using the name field.

In [None]:
name = "churn-prediction-job"

#### Image

* Choosing the Right Approach for specifying image:
    Depending on your scenario, you can choose to deploy either a pre-built Docker image or build a Docker image from your source code.
    
* Using Pre-Built Images
    If you already have a Docker image that you've previously built and pushed to a container registry, you can use the `Image` class.
    The `Image class` would simply reference the pre-built image URL and use it for deployment.
* Using Build for Source Code
    In cases where you don't have a pre-built image, you'll use the `Build` option to create an image from your source code.
    This scenario applies when you want to package and deploy your application from scratch.
    * Creating DockerFile with PythonBuild
        If you don't have a Dockerfile but your application is written in Python, you can use the `PythonBuild` class.
        The `PythonBuild` class will inspect your Python code and create a Dockerfile automatically based on the code's requirements.
    * Choosing DockerBuild for Dockerfile
        If you have a pre-existing Dockerfile, you can use the `DockerBuild` class.
        This allows you to directly reference the Dockerfile present in your code repository.

In this case given we did not have a prebuilt image, and no dockerfile in our source code we are using PythonBuild, which takes our code configuration from us and templatizes a Dockerfile for us.


In the Command field, enter the command to execute your training job, including placeholders for hyperparameters like {{n_neighbors}}, {{weights}}, etc.  
These are going to be the same names we specify in the Params configuration below, so keep this in mind.

In [None]:
from servicefoundry import Build, PythonBuild, LocalSource

image = Build(
    build_spec=PythonBuild(
        command="python main.py --n_neighbors {{n_neighbors}} --weights {{weights}}",
        requirements_path="requirements.txt"
    ),
)

#### Params
The `Param` option empowers you to configure hyperparameters and pass them to create distinct job runs.

For each parameter, provide the following details:

- **Name:** Enter a descriptive name for the parameter.
- **Default value:** Specify the default value for the parameter.
- **Description:** Include a brief description of the parameter's purpose.
- **Param type:** Can be either string or an ML Repo

Note that the name of Param are same as what we filled in the comman's {{}} template. `python main.py --n_neighbors {{n_neighbors}} --weights {{weights}`

In [None]:
from servicefoundry import Param

params = [
    Param(
        name="n_neighbors",
        default=5,
        description="Number of neighbors to use by default"
    ),
    Param(
        name="weights",
        default="uniform",
        description="Weight function used in prediction.  Possible values: uniform, distance"
    ),
]

#### Resources
Allocate computing resources (CPU, memory, storage) for your service using the Resources option.<br>
* **CPU** refers to the computing power available to your application
* **Memory** refers to how much space your application has to hold and work with data while it's running
* **Ephemeral storage** is where your application can temporarily store files and data

Requests and Limits:

* **Request** is like asking for a certain amount of a resource. It's what your application initially asks for to start working properly.
* **Limit** is like setting a maximum value. It restricts how much of a resource (like CPU or memory) your application can use.

So for each category of resource you specify the Request and Limits

In [None]:
from servicefoundry import Resources

resources = Resources(
    memory_limit=500,
    memory_request=500,
    ephemeral_storage_limit=600,
    ephemeral_storage_request=600,
    cpu_limit=0.3,
    cpu_request=0.3
)

### Step 2: Bring all of the configuration together via the Job Class and Deploy

To deploy your machine learning job, you need to create an instance of the `Job` class provided by the servicefoundry library. This instance will encapsulate all the necessary configurations and parameters for deploying and managing your job.

In [None]:
from servicefoundry import Job

job = Job(
    name=name,
    image=image,
    resources=resources,
    params=params
)

After configuring your deployment settings, you can deploy the job using the deploy method. Here we are replacing the WORKSPACE_FQN with the workspace_fqn we stored earlier.

In [None]:
# Deploy the job
job.deploy(workspace_fqn=workspace_fqn)

Once the build is complete, you will see a link to the dashboard after a message like `You can find the application on the dashboard:-`. <br>Click on the link to access the deployment dashboard.

# Effortless Hyperparameter Experimentation

Once your deployment is active, navigate to your specific job by clicking on it. This action will open a dedicated dashboard displaying various job details, including the **Run Job** button.

![](https://files.readme.io/cfff7cd-Screenshot_2023-08-23_at_1.48.02_PM.png)

Clicking this button will trigger a modal to appear:

![](https://files.readme.io/971a7fe-Screenshot_2023-08-23_at_1.51.38_PM.png)

Within this modal, you can effortlessly adjust hyperparameter values for rapid experimentation.

After configuring the modal, submit it using the Run Job button. This action will redirect you to the Job Runs tab. Within a few moments, your job status should switch to Finished.

Proceed by clicking on the logs button to access your job's results:

![](https://files.readme.io/1b79056-Screenshot_2023-08-28_at_7.17.03_AM.png)

Now closing, clicking the purple **churn-train-job** badge will grant you access to the Key Metrics, Hyperparameters, Logged Model, and Associated Artifacts from the run.


![](https://files.readme.io/0113700-Screenshot_2023-08-28_at_7.14.36_AM.png)

# Additional Capabilities of Jobs

Let's delve into the advanced functionalities that Jobs offer, extending beyond deployment strategies:

- **Continuous Integration/Continuous Deployment (CI/CD) via Truefoundry:** Integrate Jobs with Truefoundry for streamlined CI/CD pipelines, ensuring efficient code integration, testing, and deployment.
- **Cron Jobs:** Schedule Jobs to run at specified intervals using cron-like expressions, automating recurring tasks and processes.
- **Job Parametrization:** Configure Jobs with parameters, allowing you to customize execution by providing dynamic input values.
- **Programmatic Job Triggers:** Trigger Jobs programmatically via APIs, enabling seamless automation and integration with external systems.
- **Additional Configurations:** Access a range of supplementary configurations to fine-tune job behavior and optimize performance.