[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/truefoundry/truefoundry-examples/blob/main/sample-notebooks/iris-tfy.ipynb)

# Iris Classification with TrueFoundry

More details about the dataset: https://archive.ics.uci.edu/ml/datasets/iris

## Install TrueFoundry libraries

1. MLFoundry - for tracking ML experiments
2. ServiceFoundry - to deploy applications from trained models

In [None]:
!pip install -U "mlfoundry>=0.3.33,<0.4.0"
!pip install -U servicefoundry

In [None]:
import pandas as pd
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, f1_score

## Create an MLFoundry client and create a run

In [None]:
import mlfoundry as mlf
mlf.login()
client = mlf.get_client()

PROJECT_NAME = 'iris-classification-project-1900'
run = client.create_run(project_name=PROJECT_NAME)

## Split datasets into train and test

In [None]:
data = datasets.load_iris()

X = pd.DataFrame(data.data, columns=data.feature_names)
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.8, stratify=y, random_state=42)

In [None]:
print(X.head())
print(data.target_names)

## Initialise the model and log params to MLFoundry

In [None]:
clf = SVC(gamma='scale', kernel='rbf', probability=True, C=1.2)
run.set_tags({'framework': 'sklearn', 'task': 'classification'})
run.log_params(clf.get_params())

## Train the model

In [None]:
clf.fit(X_train, y_train)

## Make predictions and log metrics to MLFoundry

In [None]:
y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)

metrics = {
    'train/accuracy_score': accuracy_score(y_train, y_pred_train),
    'train/f1_weighted': f1_score(y_train, y_pred_train, average='weighted'),
    'train/f1_mirco': f1_score(y_train, y_pred_train, average='micro'),
    'train/f1_macro': f1_score(y_train, y_pred_train, average='macro'),
    'test/accuracy_score': accuracy_score(y_test, y_pred_test),
    'test/f1_weighted': f1_score(y_test, y_pred_test, average='weighted'),
    'test/f1_mirco': f1_score(y_test, y_pred_test, average='micro'),
    'test/f1_macro': f1_score(y_test, y_pred_test, average='macro'),
}

run.log_metrics(metrics)

## Log model and end run

In [None]:
run.log_model(clf, framework=mlf.ModelFramework.SKLEARN)
run.end()

In [None]:
print(run.run_id)

## Login to servicefoundry

In [None]:
import servicefoundry.core as sfy
sfy.login()

### Create a Servicefoundry workspace

A Servicefoundry workspace is a collection of related services that share the same set of permissions.

To create a workspace:

1. Go to <a href="https://app.truefoundry.com/workspace">ServiceFoundry dashboard</a>

2. Click on `Create Workspace` to create a new workspace. **Choose the largest tier available since we will be deploying two services.**

3. Once the workspace is created, copy the FQN of the workspace. We shall use this to deploy our webapp and service to the workspace.

In [None]:
WORKSPACE_FQN = input("Input workspace FQN copied from the dashboard ")

## Create a Python file to deploy as an API Service

In this Python file, we write a function that will return the species of iris flower using the model we just trained.

ServiceFoundry will automatically create an endpoint out of this function.

Notice that we load the model using `run.get_model()` and we used the run id we printed above after training to load the model without having to write any `sklearn` code.


### **IMPORTANT**: While running the notebook, replace the `RUN_ID` with your API key and current run id

In [None]:
%%writefile predict.py
import os
import pandas as pd
import mlfoundry as mlf
import json

RUN_ID = 'e619e9a3243e426aa3aa263c00dc13a4'

client = mlf.get_client(api_key=os.environ.get('TFY_API_KEY'))
run = client.get_run(RUN_ID)
model = run.get_model()

def species(features):
  features = json.loads(features)
  df = pd.DataFrame.from_dict([features])
  prediction = model.predict(df)[0]
  return ['setosa', 'versicolor', 'virginica'][prediction]

In [None]:
requirements = sfy.gather_requirements("predict.py")
service = sfy.Service("predict.py", requirements, sfy.Parameters(
    name="iris-service",
    workspace=WORKSPACE_FQN,
    cpu=sfy.CPU(required=1),
    memory=sfy.Memory(required=1024 * 1000 * 1000)
))

service.deploy()

## Deploying a Gradio webapp

To deploy a Gradio app for the model, we simply assing the Gradio Interface object to a variable called `app`.

Once again, we are using the run_id we printed above (while training) to load the model from MLFoundry

In [None]:
# install gradio
!pip install gradio==3.0.17


### **IMPORTANT**: While running the notebook, replace the `RUN_ID` with your API key and current run id

In [None]:
%%writefile webapp.py
import gradio as gr
import pandas as pd
import mlfoundry as mlf
import json

RUN_ID = 'e619e9a3243e426aa3aa263c00dc13a4'

client = mlf.get_client(api_key=os.environ.get('TFY_API_KEY'))
run = client.get_run(RUN_ID)
model = run.get_model()

def predict_species(f1, f2, f3, f4):
    df = pd.DataFrame.from_dict([[f1, f2, f3, f4]])
    prediction = model.predict(df)[0]
    return ['setosa', 'versicolor', 'virginica'][prediction]

examples = [[5.1, 3.5,	1.4,	0.2]]
app = gr.Interface(fn=predict_species, title="Iris Classification", inputs=[gr.Number(label="sepal length (cm)"), gr.Number(label="sepal width (cm)"), gr.Number(label="petal length (cm)"), gr.Number(label="petal width (cm)")], outputs=[gr.Textbox(label="Answer")], examples=examples)


In [None]:
requirements = sfy.gather_requirements("webapp.py")
webapp = sfy.Gradio("webapp.py", requirements, sfy.Parameters(
    name="gradio-app",
    workspace=WORKSPACE_FQN,
    cpu=sfy.CPU(required=1),
    memory=sfy.Memory(required=1024 * 1000 * 1000)
))

webapp.deploy()

## Logging Grid Search Results

GridSearch can be used to identify the optimal hyper-parameters for your model. Using MLFoundry, we create a project to track various hyper-parameters and the corresponding model performace.

**Each run in this project corresponds to a unique set of hyper-parameters.**

In [None]:
from sklearn.model_selection import GridSearchCV

iris = datasets.load_iris()
parameters = {'kernel':('linear', 'rbf'), 'C':[1, 5, 10]}
clf = GridSearchCV(SVC(), parameters)
clf.fit(iris.data, iris.target)

In [None]:
def log_grid_search_results(project_name, classifier):
  results = classifier.cv_results_
  count = len(results['mean_test_score'])
  runs = [client.create_run(project_name, f'parameter-set-{i+1}') for i in range(0, count)]
  [each.set_tags({'run_type': 'grid_search'}) for each in runs]
  for i in range(0, count):
    runs[i].log_params(results['params'][i])

    runs[i].log_metrics({
        'rank': results['rank_test_score'][i],
        'mean_test_score': results['mean_test_score'][i],
        'mean_fit_time': results['mean_fit_time'][i],
        'std_score_time': results['std_score_time'][i]
    })

In [None]:
log_grid_search_results(PROJECT_NAME, clf)

## Logging K-fold cross validation scores

We use an MLFoundry project to log metrics during k-fold cross validation. 

**Each run corresponds to a single fold and logs the fold dataset and metrics.**

In [None]:
from sklearn.model_selection import StratifiedKFold

iris_df = datasets.load_iris(as_frame=True)

features = iris_df.data
actuals = iris_df.target.apply(lambda class_index: iris_df.target_names[class_index])

kf = StratifiedKFold(n_splits=5)
for i, (train_index, test_index) in enumerate(kf.split(features, y=actuals)):
    # create a run named fold-n for the nth-fold
    run = client.create_run(PROJECT_NAME, f'fold-{i+1}')
    run.set_tags({'run_type': 'cross_validation'})

    X_train, X_test = (
          features.iloc[train_index],
          features.iloc[test_index],
      )
    y_train, y_test = (
          actuals.iloc[train_index],
          actuals.iloc[test_index],
      )

    # log train dataset
    run.log_dataset(
      features=X_train,
      actuals=y_train,
      dataset_name=f"fold-{i + 1}-train",
      only_stats=True,
    )

    # log test dataset
    run.log_dataset(
        features=X_test,
        actuals=y_test,
        dataset_name=f"fold-{i + 1}-test",
        only_stats=True,
    )

    # model training
    clf = SVC(gamma='scale', kernel='rbf', probability=True, C=1.2)
    clf.fit(X_train, y_train)

    y_pred_train = clf.predict(X_train)
    y_pred_test = clf.predict(X_test)

    metrics = {
        'train/accuracy_score': accuracy_score(y_train, y_pred_train),
        'train/f1_weighted': f1_score(y_train, y_pred_train, average='weighted'),
        'train/f1_mirco': f1_score(y_train, y_pred_train, average='micro'),
        'train/f1_macro': f1_score(y_train, y_pred_train, average='macro'),
        'test/accuracy_score': accuracy_score(y_test, y_pred_test),
        'test/f1_weighted': f1_score(y_test, y_pred_test, average='weighted'),
        'test/f1_mirco': f1_score(y_test, y_pred_test, average='micro'),
        'test/f1_macro': f1_score(y_test, y_pred_test, average='macro'),
    }

    run.log_metrics(metrics)

## Logging K-fold cross validation in a single run

In the above cell, we logged cross validation metrics for each fold in a separate run. We can also log it in a single run by specifying the `step` argument with `log_metrics`

In [None]:
run = client.create_run(PROJECT_NAME, 'cross-validation-run')

for i, (train_index, test_index) in enumerate(kf.split(features, y=actuals)):
    X_train, X_test = (
          features.iloc[train_index],
          features.iloc[test_index],
      )
    y_train, y_test = (
          actuals.iloc[train_index],
          actuals.iloc[test_index],
      )

    # model training
    clf = SVC(gamma='scale', kernel='rbf', probability=True, C=1.2)
    clf.fit(X_train, y_train)

    y_pred_train = clf.predict(X_train)
    y_pred_test = clf.predict(X_test)

    metrics = {
        'train/accuracy_score': accuracy_score(y_train, y_pred_train),
        'train/f1_weighted': f1_score(y_train, y_pred_train, average='weighted'),
        'train/f1_mirco': f1_score(y_train, y_pred_train, average='micro'),
        'train/f1_macro': f1_score(y_train, y_pred_train, average='macro'),
        'test/accuracy_score': accuracy_score(y_test, y_pred_test),
        'test/f1_weighted': f1_score(y_test, y_pred_test, average='weighted'),
        'test/f1_mirco': f1_score(y_test, y_pred_test, average='micro'),
        'test/f1_macro': f1_score(y_test, y_pred_test, average='macro'),
    }

    run.log_metrics(metrics, i) # pass i as second argument `step`

run.end()