## Prediction rig experiments and random thoughts

(WIP)
The prediction rig is a component a bit overlooked when designing production scale ML platforms.
Other parts like components for supporting EdA or training tend to get more engineering focus.
But at the end of the days, the value generation process occur precisely at predicion phase and is also the external interface for investments on ML.
I think that inside the rig there should be at least the following components:

- A prediction endpoint, of course, with capabilities for:
    - Running low latency predictions
    - Scalability, fault tolerance ..
    - Advance loggig capabilities particulary important for ground truth checking
    - A/B canary rollout capabilities
    - Explanability
- A feature transformer:
    - In online models there tend to be a gap between the raw data and the data used for running the prediction.
- A prediction transformer:
    - blah
- A model warmer:
    - blah
- A Feature Store for decoupling data producers and features
    - Online
    - Batch
    - blah blah 
- A model (de)promoter

..insert diagram of a full prediction rig

This NB currenty shows:

- [X] Use of KFServing 
- [ ] Use of KFServing with data transformer
- [ ] Use of KFServing together with TF model warmer
- [X] Use of FEAST for online pred
- [ ] Use of FEAST for batch pred

Lets start by generating a simple regressor model, using a Keras dataset.
Regressor miles per gallon (MPG) based on car type.
Taken from https://www.tensorflow.org/tutorials/keras/regression
Deploy FEAST on GKE as explained on https://docs.feast.dev/installation/gke

In [None]:
!pip install --user -q seaborn
!pip install --user feast
!pip install --user kfserving
!pip install --user google-cloud-storage
!pip install --user wget

In [None]:
import pathlib
import wget
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import os
import pandas as pd
import datetime
from google.cloud import storage

In [None]:
import tensorflow.compat.v1 as tf
from tensorflow.keras.backend import get_session
from tensorflow.keras import backend as K
from tensorflow import keras
from tensorflow.keras import layers, Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import RMSprop
from tensorflow.keras.utils import get_file

In [None]:
dataset_path = wget.download("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data", "auto-mpg.data")
column_names = ['MPG','Cylinders','Displacement','Horsepower','Weight',
                'Acceleration', 'Model Year', 'Origin']
raw_dataset = pd.read_csv(dataset_path, names=column_names,
                      na_values = "?", comment='\t',
                      sep=" ", skipinitialspace=True)

dataset = raw_dataset.copy()
dataset = dataset.dropna()
dataset['Origin'] = dataset['Origin'].map({1: 'USA', 2: 'Europe', 3: 'Japan'})
dataset = pd.get_dummies(dataset, prefix='', prefix_sep='')
sns.pairplot(dataset[["MPG", "Cylinders", "Displacement", "Weight"]], diag_kind="kde")

In [None]:
train_dataset = dataset.sample(frac=0.8,random_state=0)
test_dataset = dataset.drop(train_dataset.index)
train_labels = train_dataset.pop('MPG')
test_labels = test_dataset.pop('MPG')

In [None]:
train_stats = train_dataset.describe()
train_stats = train_stats.transpose()
train_stats
def norm(x):
  return (x - train_stats['mean']) / train_stats['std']
normed_train_data = norm(train_dataset)
normed_test_data = norm(test_dataset)

In [None]:
sess = tf.Session()
K.set_session(sess)
K.set_learning_phase(0)

In [None]:
model = Sequential([
    Dense(64, activation=tf.nn.relu, input_shape=[len(train_dataset.keys())]),
    Dense(64, activation=tf.nn.relu),
    Dense(1)
  ])

optimizer = tf.keras.optimizers.RMSprop(0.001)

model.compile(loss='mse',
                optimizer=optimizer,
                metrics=['mae', 'mse'])
EPOCHS = 1000

history = model.fit(
  normed_train_data, train_labels,
  epochs=EPOCHS, validation_split = 0.2)

In [None]:
test_predictions = model.predict(normed_test_data).flatten()
print(normed_test_data)
plt.scatter(test_labels, test_predictions)
plt.xlabel('True Values [MPG]')
plt.ylabel('Predictions [MPG]')
lims = [0, 50]
plt.xlim(lims)
plt.ylim(lims)
_ = plt.plot(lims, lims)

In [None]:
model_path = "model"
version = "1"
export_path = model_path+"/"+version
builder = tf.saved_model.builder.SavedModelBuilder(export_path)
x = model.input
y = model.output

tensor_info_x = tf.saved_model.utils.build_tensor_info(x)
tensor_info_y = tf.saved_model.utils.build_tensor_info(y)

prediction_signature = (
          tf.saved_model.signature_def_utils.build_signature_def(
              inputs={'car_values': tensor_info_x},
              outputs={'mpg': tensor_info_y},
          method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))


builder.add_meta_graph_and_variables(
      sess, [tf.saved_model.tag_constants.SERVING],
      signature_def_map={
          'predict_car_values':
              prediction_signature
      },
      main_op=tf.global_variables_initializer(),
      strip_default_attrs=True)

builder.save()

In [None]:
#Do not use this, we need to get the backend tf sess and generte the signatureDef, otherwise it fails! This is the only way I found to get Keras support
#model.save("model/1", save_format='tf')

In [None]:
!saved_model_cli show --dir "model/1" --tag_set serve --signature_def predict_car_values

In [None]:
storage_client = storage.Client()
bucket = storage_client.get_bucket("velascoluis-test")
if bucket:
    for root, _, files in os.walk("model/1"):
        for file in files:
            path = os.path.join(root, file)
            blob = bucket.blob("mpg/model/1/"+file)
            print('Uploading ..' + path)
            blob.upload_from_filename(path)

In [None]:
from kubernetes import client
from kfserving import KFServingClient
from kfserving import constants
from kfserving import V1alpha2EndpointSpec
from kfserving import V1alpha2PredictorSpec
from kfserving import V1alpha2TensorflowSpec
from kfserving import V1alpha2InferenceServiceSpec
from kfserving import V1alpha2InferenceService
from kubernetes.client import V1ResourceRequirements

In [None]:
api_version = constants.KFSERVING_GROUP + '/' + constants.KFSERVING_VERSION
now = datetime.datetime.utcnow().strftime("%Y%m%d%H%M%S")
inference_service_name = 'pred111'
default_endpoint_spec = V1alpha2EndpointSpec(
    predictor=V1alpha2PredictorSpec(
    tensorflow=V1alpha2TensorflowSpec(
    storage_uri="gs://velascoluis-test/mpg/model",
    resources=V1ResourceRequirements(
    requests={'cpu': '100m', 'memory': '1Gi'},
    limits={'cpu': '100m', 'memory': '1Gi'}))))

isvc = V1alpha2InferenceService(api_version=api_version,
                                    kind=constants.KFSERVING_KIND,
                                    metadata=
                                        client.V1ObjectMeta(
                                            name=inference_service_name,
                                            annotations=
                                            {
                                                'sidecar.istio.io/inject': 'false',
                                                'autoscaling.knative.dev/target': '1'
                                            },
                                            namespace="kubeflow-velascoluis"
                                                            ),
                                    spec=
                                        V1alpha2InferenceServiceSpec(default=default_endpoint_spec))

# Idea is to insert here the transformer or should it be at kafka level in feast????
#velascoluis: sidecar is disables by https://github.com/knative/serving/issues/6829


KFServing = KFServingClient()
#KFServing.set_credentials(storage_type='GCS',
#                          namespace='kubeflow-velascoluis',
#                          credentials_file=os.environ['GOOGLE_APPLICATION_CREDENTIALS'],
#                          service_account='default-editor')

KFServing.create(isvc)
KFServing.get(inference_service_name, namespace="kubeflow-velascoluis", watch=True, timeout_seconds=120)

In [None]:
!curl -v -H "Host: pred111.kubeflow-velascoluis.example.com" http://34.76.151.35/v1/models/pred111:predict -d "{ \"signature_name\": \"predict_car_values\",  \"instances\":[[1.483887,      1.865988,    2.234620,  1.018782,     -2.530891,   -1.604642, -0.465148, -0.495225,  0.774676]]}"

In [None]:
from pytz import timezone, utc
from feast import Client, FeatureSet, Entity, ValueType, Feature
from feast.serving.ServingService_pb2 import GetOnlineFeaturesRequest
from feast.types.Value_pb2 import Value as Value
from google.protobuf.duration_pb2 import Duration
from datetime import datetime, timedelta
from random import randrange
import random

In [None]:
FEAST_IP="35.241.140.170"
FEAST_CORE_URL=FEAST_IP+":32090"
FEAST_ONLINE_SERVING_URL=FEAST_IP+":32091"
FEAST_BATCH_SERVING_URL=FEAST_IP+":32092"

In [None]:
client = Client(core_url=FEAST_CORE_URL, serving_url=FEAST_ONLINE_SERVING_URL)
client.create_project('feast_kfserving')
client.set_project('feast_kfserving')

In [None]:
features_ingest = train_dataset
features_ingest['datetime']=np.random.choice(pd.date_range('2020-01-01', '2020-04-15'), len(features_ingest))
features_ingest['car_id'] = np.arange(len(train_dataset_datetime))
features_ingest.columns = map(str.lower, train_dataset_datetime.columns)
features_ingest.columns = features_ingest.columns.str.replace(' ', '')
print(features_ingest)

In [None]:
cars_f = FeatureSet(
    "car_features",
    entities=[Entity(name='car_id', dtype=ValueType.INT64)],
    max_age=Duration(seconds=432000)    
)

In [None]:
cars_f.infer_fields_from_df(features_ingest, replace_existing_features=True)

In [None]:
client.apply(cars_f)

In [None]:
car_features = client.get_feature_set("car_features",version=1)
print(car_features)

In [None]:
client.ingest("car_features", features_ingest)

In [None]:
online_features = client.get_online_features(
    feature_refs=[
        f"cylinders",
        f"displacement",
        f"horsepower",
        f"acceleration",
        f"europe",
        f"japan",
        f"usa",
        
    ],
    entity_rows=[
        GetOnlineFeaturesRequest.EntityRow(
            fields={
                "car_id": Value(
                    int64_val=10)
            }
        )
    ],
)

In [None]:
#Need to call random here ..

In [None]:
!curl -v -H "Host: pred111.kubeflow-velascoluis.example.com" http://34.76.151.35/v1/models/pred111:predict -d "{ \"signature_name\": \"predict_car_values\",  \"instances\":[["+online_features+""]]}"