# Combining Pachyderm and UbiOps for model training and deploying
This notebook shows how to create a complete data science application. 
A dataset of labelled faces is stored in Pachyderm. This set is used to train an algorithm using Pachyderm pipelines to predict the age of humans. The trained algorithm is then sent to UbiOps for serving. 


In [None]:
from IPython.display import Image
Image(filename="architecture.png")

## Download the train data
This data will be used to train the model we are building, it takes a while to download since so it is a good idea to start the download as soon as possible.

In [None]:
!./download_data.sh

## Setup Pachyderm 
First we need to setup Pachyderm. The easiest way is to use Pachyderm Hub https://hub.pachyderm.com/
![pachyderm hub](pachy_hub.png)


## Connecting to your hub
After creating your pachyderm hub, click on the connect button and follow the steps to connect your device to the pachyderm environment.
![pachyderm connect](pachy_connect.png)

## Create Pachyderm repo
Now that we have created a Pachyderm hub enviroment we can create a Pachyderm repo to store th face training data.

In [None]:
!pachctl create repo faces
!pachctl list repo

## Add UbiOps token
Because we want Pachyderm to sent the trained model to UbiOps we need to add credentials to Pachyderm. We will do that by adding a UbiOps token to a secret in Pachyderm.

You can learn more about getting this token by creating a service user here:
https://ubiops.com/docs/organizations/service-users/#creating-a-service-user

![UbiOps token](../pictures/create-token.gif)

#### Important is to add the role 'project-editor' so that the token has enought rights.

Add your token below to write to the file

In [None]:
%%writefile secret.yaml
{
    "apiVersion": "v1",
    "kind": "Secret",
    "metadata": {
       "name": "ubiops"
    },
    "type": "Opaque",
    "stringData": {
       "token": "Token <YOUR_TOKEN_HERE>" # Add your token here
    }
 }


In [None]:
#!pachctl delete secret ubiops
!pachctl create secret -f secret.yaml

Inference script

In [None]:
# Create directory to host the deployment you will be creating
!mkdir pachy_source/tensorflow_deployment_package

## Pachyderm pipeline script
This script runs in a Pachyderm pipeline and is used to train the model from the face train set. It also sends this trained model to UbiOps for serving.

In [None]:
%%writefile pachy_source/main.py

# Enter your project name below
PROJECT_NAME = <YOUR_PROJECT_NAME>
DEPLOYMENT_NAME = 'tensorflow-deployment'
DEPLOYMENT_VERSION = 'v1'

import os
import ubiops
from pathlib import Path
import multiprocessing
import pandas as pd
from sklearn.model_selection import train_test_split
import hydra
from hydra.utils import to_absolute_path
import tensorflow as tf
from tensorflow.keras.callbacks import LearningRateScheduler, ModelCheckpoint
from src.factory import get_model, get_optimizer, get_scheduler
from src.generator import ImageSequence
from hydra.experimental import compose, initialize
from omegaconf import OmegaConf


# Import all necessary libraries
import shutil
import os
import ubiops



#@hydra.main(config_path="age_gender_estimation/src/config.yaml")
#we cannot use this in jupyter notebooks
def main(cfg):
    if cfg.wandb.project:
        import wandb
        from wandb.keras import WandbCallback
        wandb.init(project=cfg.wandb.project)
        callbacks = [WandbCallback()]
    else:
        callbacks = []
        
    data_path = Path("/pfs/faces/data/imdb_crop")
    #data_path = Path("/home/raoulfasel/Documents/pachyderm/age_gender_estimation/data/imdb_crop")
    
    csv_path = Path(to_absolute_path("./")).joinpath("meta", f"{cfg.data.db}.csv")
    #csv_path = Path(to_absolute_path("/pfs/faces")).joinpath("meta", f"{cfg.data.db}.csv")
    print(csv_path)
    df = pd.read_csv(str(csv_path))
    train, val = train_test_split(df, random_state=42, test_size=0.1)
    train_gen = ImageSequence(cfg, train, "train", data_path)
    val_gen = ImageSequence(cfg, val, "val", data_path)

    strategy = tf.distribute.MirroredStrategy()

    with strategy.scope():
        model = get_model(cfg)
        opt = get_optimizer(cfg)
        scheduler = get_scheduler(cfg)
        model.compile(optimizer=opt,
                      loss=["sparse_categorical_crossentropy", "sparse_categorical_crossentropy"],
                      metrics=['accuracy'])

    #checkpoint_dir = Path(to_absolute_path("age_gender_estimation")).joinpath("checkpoint")
    checkpoint_dir = Path(to_absolute_path("/pfs/build")).joinpath("checkpoint")

    print(checkpoint_dir)
    checkpoint_dir.mkdir(exist_ok=True)
    filename = "_".join([cfg.model.model_name,
                         str(cfg.model.img_size),
                         "weights.{epoch:02d}-{val_loss:.2f}.hdf5"])
    callbacks.extend([
        LearningRateScheduler(schedule=scheduler),
        ModelCheckpoint(str(checkpoint_dir) + "/" + filename,
                        monitor="val_loss",
                        verbose=1,
                        save_best_only=True,
                        mode="auto")
    ])

    model.fit(train_gen, epochs=cfg.train.epochs, callbacks=callbacks, validation_data=val_gen,
              workers=multiprocessing.cpu_count())
    
    model.save("tensorflow_deployment_package/tensorflow_model.h5")

    # Create the dployment on UbiOps
    with open('/opt/ubiops/token', 'r') as reader:
        API_TOKEN = reader.read()
    client = ubiops.ApiClient(ubiops.Configuration(api_key={'Authorization': API_TOKEN}, 
                                               host='https://api.ubiops.com/v2.1'))
    api = ubiops.CoreApi(client)
    
    # Create the deployment
    deployment_template = ubiops.DeploymentCreate(
        name=DEPLOYMENT_NAME,
        description='Tensorflow deployment',
        input_type='structured',
        output_type='structured',
        input_fields=[
            {'name':'input_image', 'data_type':'file'}
        ],
        output_fields=[
            {'name':'output_image', 'data_type':'file'}
        ],
        labels={"demo": "tensorflow"}
    )

    api.deployments_create(
        project_name=PROJECT_NAME,
        data=deployment_template
    )

    # Create the version
    version_template = ubiops.DeploymentVersionCreate(
        version=DEPLOYMENT_VERSION,
        environment='python3-8',
        instance_type='2048mb',
        minimum_instances=0,
        maximum_instances=1,
        maximum_idle_time=1800, # = 30 minutes
        request_retention_mode='none' # We don't need to store the requests for this deployment
    )

    api.deployment_versions_create(
        project_name=PROJECT_NAME,
        deployment_name=DEPLOYMENT_NAME,
        data=version_template
    )

    # Zip the deployment package
    shutil.make_archive('tensorflow_deployment_package', 'zip', '.', 'tensorflow_deployment_package')

    # Upload the zipped deployment package
    file_upload_result =api.revisions_file_upload(
        project_name=PROJECT_NAME,
        deployment_name=DEPLOYMENT_NAME,
        version=DEPLOYMENT_VERSION,
        file='tensorflow_deployment_package.zip'
    )

if __name__ == '__main__':
    #with initialize(config_path="age_gender_estimation/src/"):
    with initialize(config_path="src/"):
        cfg = compose(config_name="config")
        print(OmegaConf.to_yaml(cfg))
    main(cfg)


## UbiOps deployment
The following script is sent to UbiOps together with a trained weight file for serving.

In [None]:
%%writefile pachy_source/tensorflow_deployment_package/deployment.py

"""
The file containing the deployment code is required to be called 'deployment.py' and should contain the 'Deployment'
class and 'request' method.
"""

import os
import pandas as pd
from tensorflow.keras.models import load_model
import argparse
from contextlib import contextmanager
from pathlib import Path

import cv2
import dlib
import numpy as np
from omegaconf import OmegaConf
from tensorflow.keras.utils import get_file

from src.factory import get_model



class Deployment:

    def __init__(self, base_directory, context):
        """
        Initialisation method for the deployment. It can for example be used for loading modules that have to be kept in
        memory or setting up connections. Load your external model files (such as pickles or .h5 files) here.

        :param str base_directory: absolute path to the directory where the deployment.py file is located
        :param dict context: a dictionary containing details of the deployment that might be useful in your code.
            It contains the following keys:
                - deployment (str): name of the deployment
                - version (str): name of the version
                - input_type (str): deployment input type, either 'structured' or 'plain'
                - output_type (str): deployment output type, either 'structured' or 'plain'
                - environment (str): the environment in which the deployment is running
                - environment_variables (str): the custom environment variables configured for the deployment.
                    You can also access those as normal environment variables via os.environ
        """

        print("Initialising the model")

        model_file = os.path.join(base_directory, "tensorflow_model.h5")
        self.model = load_model(model_file)


    def request(self, data):
        """
        Method for deployment requests, called separately for each individual request.

        :param dict/str data: request input data. In case of deployments with structured data, a Python dictionary
            with as keys the input fields as defined upon deployment creation via the platform. In case of a deployment
            with plain input, it is a string.
        :return dict/str: request output. In case of deployments with structured output data, a Python dictionary
            with as keys the output fields as defined upon deployment creation via the platform. In case of a deployment
            with plain output, it is a string. In this example, a dictionary with the key: output.
        """
        print('Loading data')
        margin = 0.4
    

        # for face detection
        detector = dlib.get_frontal_face_detector()

        # load model and weights
        img_size = 224

        img = read_image(data.get('input_image'))
        input_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img_h, img_w, _ = np.shape(input_img)

        # detect faces using dlib detector
        detected = detector(input_img, 1)
        faces = np.empty((len(detected), img_size, img_size, 3))

        if len(detected) > 0:
            for i, d in enumerate(detected):
                x1, y1, x2, y2, w, h = (
                    d.left(),
                    d.top(),
                    d.right() + 1,
                    d.bottom() + 1,
                    d.width(),
                    d.height(),
                )
                xw1 = max(int(x1 - margin * w), 0)
                yw1 = max(int(y1 - margin * h), 0)
                xw2 = min(int(x2 + margin * w), img_w - 1)
                yw2 = min(int(y2 + margin * h), img_h - 1)
                cv2.rectangle(img, (x1, y1), (x2, y2), (255, 0, 0), 2)
                # cv2.rectangle(img, (xw1, yw1), (xw2, yw2), (255, 0, 0), 2)
                faces[i] = cv2.resize(
                    img[yw1 : yw2 + 1, xw1 : xw2 + 1], (img_size, img_size)
                )

            # predict ages and genders of the detected faces
            results = self.model.predict(faces)
            predicted_genders = results[0]
            ages = np.arange(0, 101).reshape(101, 1)
            predicted_ages = results[1].dot(ages).flatten()

            # draw results
            for i, d in enumerate(detected):
                label = "{}, {}".format(
                    int(predicted_ages[i]), "M" if predicted_genders[i][0] < 0.5 else "F"
                )
                draw_label(img, (d.left(), d.top()), label)

            RGB_img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
            cv2.imwrite("prediction.jpg", RGB_img)

        return {
            "output_image": "prediction.jpg"
        }

def draw_label(
    image, point, label, font=cv2.FONT_HERSHEY_SIMPLEX, font_scale=1.5, thickness=2
):
    size = cv2.getTextSize(label, font, font_scale, thickness)[0]
    x, y = point
    cv2.rectangle(image, (x, y - size[1]), (x + size[0], y), (255, 0, 0), cv2.FILLED)
    cv2.putText(
        image,
        label,
        point,
        font,
        font_scale,
        (255, 255, 255),
        thickness,
        lineType=cv2.LINE_AA,
    )

def read_image(image_path):
    img = cv2.imread(str(image_path), 1)

    if img is not None:
        h, w, _ = img.shape
        r = 640 / max(w, h)
        return cv2.resize(img, (int(w * r), int(h * r)))


Next up is the `ubiops.yaml` which allows installing of linux dependencies that are required for some python libraries.

In [None]:
%%writefile pachy_source/tensorflow_deployment_package/ubiops.yaml

apt:
  packages:
    - cmake
    - protobuf-compiler
    - build-essential
    - python3.8-dev

## Misc
We are using this example as a basis for our model:
https://github.com/UbiOps/tutorials/blob/master/tensorflow-example/tensorflow-ubiops-example/tensorflow_template.ipynb
We added a part of this repo already. But in order to include them to both the UbiOps and the Pachyderm packages we copy the necessary files to the correct locations

In [None]:
!cp age_gender_estimation/requirements.txt pachy_source/requirements.txt 
!cp age_gender_estimation/requirements.txt pachy_source/tensorflow_deployment_package/requirements.txt
!mkdir pachy_source/tensorflow_deployment_package/age_gender_estimation
!cp -r age_gender_estimation/src pachy_source/tensorflow_deployment_package/age_gender_estimation/src
!cp -r age_gender_estimation/src pachy_source/src
!cp -r age_gender_estimation/meta pachy_source/meta

!echo ubiops >> pachy_source/requirements.txt


## Create the pachyderm pipeline
Now that we put all the files in the correct place we can actually create the Pachyderm pipeline.

In [None]:
%%writefile pachy_source/build_pipeline.json

#test build pipeline

{
  "pipeline": {
    "name": "faces_train_model"
  },
  "datum_tries": 1,
  "description": "A pipeline that trains our neural network",
  "transform": {
    "build": {
      "image": "raoulfaselubiops/pachyderm-builder:latest", # We are using a custom Pachyderm builder pipeline here
      "path": "./"
    },
    "secrets": [ {
        "name": "ubiops",
        "mount_path": "/opt/ubiops"
    },
    ] 
  },
  "input": {
    "pfs": {
      "repo": "faces",
      "glob": "/*"
    }
  }
}

In [None]:
!pachctl update pipeline -f pachy_source/build_pipeline.json

## Upload the face dataset
Finally we can upload the face dataset. This is only uploading a subset of the data because it takes too long, but you can edit it to upload more data. It will take a while before the deloyment shows up in your UbiOps environment (roughly around 10 minutes). Important thing to know is that everytime you run this (upload a file) it will trigger the pipeline on Pachyderm to update the deployment on UbiOps.

In [None]:
#This can take a while, eventhough I am only uploading a small part of the dataset.
#It is also faster running from the real terminal
!pachctl put file -p=30 --progress=false -r faces@master:data/imdb_crop/00 -f data/imdb_crop/00 #we are using the imdb db for this example

In [None]:
!pachctl list job
!pachctl list repo

## Fingers crossed
If all went well the upload should trigger the Pachyderm pipeline. That means that within a few minutes the trained model will be in UbiOps.

And the cool thing is that every new upload will trigger this pipeline.

## Next Steps
Now we have model training and serving figured the next step ofcourse is connecting it to a front end. We are not covering that in this notebook but we have some on our tutorial page.