# Custom Prediction Routine (CPR)

> Build custom container for deploying trained policy to Vertex Prediction online endpoint

### references

* [src code](https://github.com/googleapis/python-aiplatform/tree/main/google/cloud/aiplatform/prediction)
* [docs](https://cloud.google.com/vertex-ai/docs/predictions/custom-prediction-routines#run_the_container_locally_optional)
* code examples
  * [SDK_Custom_Predict_and_Handler_SDK_Integration](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/prediction/custom_prediction_routines/SDK_Custom_Predict_and_Handler_SDK_Integration.ipynb)
  * [SDK_Custom_Preprocess](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/ef8b70db32813b8a2f128ab5ef1d170aea739e7f/notebooks/community/prediction/custom_prediction_routines/SDK_Custom_Preprocess.ipynb)
  
**In the built image, user provided files will be copied as follows:**

```
    container_workdir/
    |-- predictor.py
    |-- requirements.txt
    |-- user_code/
    |   |-- utils.py
    |   |-- custom_package.tar.gz
    |   |-- ...
    |-- ...
```

## Notebook config

**in this notebook** we need to be concious of our current working directory (path)

In [1]:
import os 

path="/home/jupyter/tf_vertex_agents/src"
os.chdir(path)
print(os.getcwd())

/home/jupyter/tf_vertex_agents/src


In [2]:
!pwd

/home/jupyter/tf_vertex_agents/src


In [3]:
VERSION        = "v2"                       # TODO
PREFIX         = f'rec-bandits-{VERSION}'   # TODO

print(f"PREFIX: {PREFIX}")

PREFIX: rec-bandits-v2


In [4]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-bucket'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"
VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "rec-bandits-v2"
VERSION                  = "v2"

BUCKET_NAME              = "rec-bandits-v2-hybrid-vertex-bucket"
BUCKET_URI               = "gs://rec-bandits-v2-hybrid-vertex-bucket"
DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://rec-bandits-v2-hybrid-vertex-bucket/data"
VOCAB_SUBDIR             = "vocabs"
VOCAB_FILENAME           = "vocab_dict.pkl"

VPC_NETWORK_FULL         = "projects/934903580331/global/networks/ucaip-haystack-vpc-network"

BIGQUERY_DATASET_NAME    = "mvlens_rec_bandits_v2"
BIGQUERY_TABLE_NAME      = "training_dataset"

REPOSITORY               = "rl-movielens-rec-bandits-v2"

DOCKERNAM

### Set vars

In [5]:
import sys
sys.path.append("..")

# this repo
from src.data import data_utils, data_config

#### Dataset

In [6]:
EXAMPLE_GEN_GCS_PATH = data_config.EXAMPLE_GEN_GCS_PATH
GCS_DATA_PATH = f"{BUCKET_URI}/{EXAMPLE_GEN_GCS_PATH}"

print(f"GCS_DATA_PATH: {GCS_DATA_PATH}")

!gsutil ls $GCS_DATA_PATH

GCS_DATA_PATH: gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/mv_b128_g12_a16/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/mv_b128_g12_a16_v4/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/mv_b128_g12_a16_v5/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/mv_b128_g12_a16_v6/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/train/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/val/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/vocabs/


#### Custom prediction container

In [7]:
IMAGE_NAME_02_PRED_CPR = "cpr-perarm-bandit-02e"
IMAGE_URI_02_PRED_CPR  = f"gcr.io/hybrid-vertex/{IMAGE_NAME_02_PRED_CPR}"
REMOTE_IMAGE_NAME_CPR  = f"{REGION}-docker.pkg.dev/{PROJECT_ID}/{REPOSITORY}/{IMAGE_NAME_02_PRED_CPR}"

print(f"REPOSITORY             = {REPOSITORY}")
print(f"IMAGE_NAME_02_PRED_CPR = {IMAGE_NAME_02_PRED_CPR}")
print(f"IMAGE_URI_02_PRED_CPR  = {IMAGE_URI_02_PRED_CPR}")
print(f"REMOTE_IMAGE_NAME_CPR  = {REMOTE_IMAGE_NAME_CPR}")

REPOSITORY             = rl-movielens-rec-bandits-v2
IMAGE_NAME_02_PRED_CPR = cpr-perarm-bandit-02e
IMAGE_URI_02_PRED_CPR  = gcr.io/hybrid-vertex/cpr-perarm-bandit-02e
REMOTE_IMAGE_NAME_CPR  = us-central1-docker.pkg.dev/hybrid-vertex/rl-movielens-rec-bandits-v2/cpr-perarm-bandit-02e


#### Set `ARTIFACTS_DIR` from previous experiment

In [8]:
# EXPERIMENT_NAME      = "02-online-1m-v6"     # TODO - replace with an experiment that has saved policy
# RUN_NAME             = "run-20240220-025133" # TODO - replace with a run that has saved policy
EXPERIMENT_NAME      = "02-supervised-bandits-v1"
RUN_NAME             = "run-20240313-192115"

BASE_OUTPUT_URI      = f"{BUCKET_URI}/{EXPERIMENT_NAME}/{RUN_NAME}"
ARTIFACTS_DIR        = f"{BASE_OUTPUT_URI}/artifacts"
EXISTING_VOCAB_FILE  = f'gs://{BUCKET_NAME}/{EXAMPLE_GEN_GCS_PATH}/{VOCAB_SUBDIR}/{VOCAB_FILENAME}'

print(f"BASE_OUTPUT_URI      : {BASE_OUTPUT_URI}")
print(f"ARTIFACTS_DIR        : {ARTIFACTS_DIR}")
print(f"EXISTING_VOCAB_FILE  : {EXISTING_VOCAB_FILE}")

BASE_OUTPUT_URI      : gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115
ARTIFACTS_DIR        : gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts
EXISTING_VOCAB_FILE  : gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/vocabs/vocab_dict.pkl


run this in terminal from root to clear `__pycache__` files...

In [9]:
# find . | grep -E "(/__pycache__$|\.pyc$|\.pyo$)" | xargs rm -rf

### (Optional) Setup credentials

Setting up credentials is only required to run the custom serving container locally with GCS paths. Credentials set up is required to execute the `Predictor`'s `load` function, which downloads the model artifacts from Google Cloud Storage.

To access Google Cloud Storage in your project, you'll need to set up credentials by using one of the following:

1. User account
2. Service account

You can learn more about each of the above [here](https://cloud.google.com/docs/authentication#principals)

Option 1: Use Google user credentials

In [10]:
# !gcloud auth application-default login
# !gcloud auth login

# USER_ACCOUNT = "TODO_USER_GCP_LOGIN"  # TODO - 00-env-setup

# !gcloud projects add-iam-policy-binding $PROJECT_ID \
#     --member=user:$USER_ACCOUNT \
#     --role=roles/storage.admin

Option 2: Use Google Service Account credentials

In [11]:
# !gcloud services enable iam.googleapis.com
# !gcloud auth login

# !gcloud projects add-iam-policy-binding $PROJECT_ID \
#     --member=serviceAccount:$VERTEX_SA \
#     --role=roles/storage.admin

Create credentials file

In [12]:
# path="/home/jupyter/tf_vertex_agents/src"
# os.chdir(path)

In [13]:
CREDENTIALS_FILE = "./credentials.json"

# !gcloud iam service-accounts keys create $CREDENTIALS_FILE \
#     --iam-account=$VERTEX_SA

### (Optional) Create Artifact Repository
If you don't have an existing artifact repository, create one using the gcloud command below

In [14]:
# ! gcloud artifacts repositories create $REPOSITORY --repository-format=docker --location=$LOCATION

## Imports

In [15]:
import os
import sys
import numpy as np
import pickle as pkl
from pprint import pprint

import logging
logging.disable(logging.WARNING)

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
os.environ["PROJECT_ID"]=PROJECT_ID

# tensorflow
import tensorflow as tf
from tf_agents.policies import py_tf_eager_policy

# google cloud
from google.cloud import storage
from google.cloud import aiplatform as vertex_ai
from google.cloud.aiplatform.utils import prediction_utils

storage_client = storage.Client(project=PROJECT_ID)

# GPU
from numba import cuda 
import gc

# this repo
sys.path.append("..")
from src.utils import reward_factory as reward_factory
from src.networks import encoding_network as emb_features

In [16]:
print("Num GPUs Available: ", len(tf.config.list_physical_devices('GPU')))

Num GPUs Available:  1


In [18]:
device = cuda.get_current_device()
device.reset()
gc.collect()

14

## Load trained model

In [15]:
! gsutil ls $ARTIFACTS_DIR

gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/fingerprint.pb
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/policy_specs.pbtxt
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/saved_model.pb
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/assets/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/variables/


In [16]:
deployment_policy = py_tf_eager_policy.SavedModelPyTFEagerPolicy(
    ARTIFACTS_DIR, 
    load_specs_from_pbtxt=True
)

deployment_policy

<tf_agents.policies.py_tf_eager_policy.SavedModelPyTFEagerPolicy at 0x7f0e1c143400>

# Create CPR directory

## Structure code for CPR

The CPR directory's structure will be the prediction serving container

Becasue we are going to use the `build_cpr_model()` method for `LocalModel()`, it need to resemble:

```
            container_workdir/
            |-- predictor.py
            |-- requirements.txt
            |-- user_code/
            |   |-- utils.py
            |   |-- custom_package.tar.gz
            |   |-- ...
            |-- ...
```

see `build_cpr_model()` [src](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/prediction/local_model.py#L147)

### Saving deployment policy

**Ultimately we'll need to call `py_tf_eager_policy.SavedModelPyTFEagerPolicy()` in our CPR...**

We can't just pass the `ARTIFACTS_DIR` because that would result in the CPR container's `model_dir` to look like this:

```
cpr_model_dir/
├── fingerprint.pb
├── policy_specs.pbtxt
├── saved_model.pb
└── variables
    ├── variables.data-00000-of-00001
    └── variables.index
```

Instead, we need the CPR container's `model_dir` to have a subdirectory holding these files like:

```
cpr_model_dir/
└── artifacts
    ├── fingerprint.pb
    ├── policy_specs.pbtxt
    ├── saved_model.pb
    └── variables
        ├── variables.data-00000-of-00001
        └── variables.index
```
.. this is compatible with `py_tf_eager_policy.SavedModelPyTFEagerPolicy()`

In [17]:
POLICY_SERVE_DIR_URI = f"{BASE_OUTPUT_URI}/policy-server"

! gsutil -q cp -r $ARTIFACTS_DIR $POLICY_SERVE_DIR_URI/

! gsutil ls $BASE_OUTPUT_URI

gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/logs/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/policy-server/


## Create local CPR directory

In [18]:
!pwd

/home/jupyter/tf_vertex_agents/src


In [19]:
LOCAL_CPR_DIR = "cpr_dir"
CPR_SUBDIR = "user_code"

In [20]:
! rm -rf ./$LOCAL_CPR_DIR
! mkdir ./$LOCAL_CPR_DIR
! mkdir ./$LOCAL_CPR_DIR/$CPR_SUBDIR

In [21]:
!ls $LOCAL_CPR_DIR

user_code


## Predictor

* Implement a custom `Predictor` that loads in the preprocesor. The preprocessor will then be used at `preprocess` time
* Note, the `PredictionHandle`r will be used for prediction request handling, and the following will be executed:

> `self._predictor.postprocess(self._predictor.predict(self._predictor.preprocess(prediction_input)))`

**references**
* predictor_utils - [src](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/utils/prediction_utils.py)

In [26]:
!pwd

/home/jupyter/tf_vertex_agents/src


In [23]:
%%writefile $LOCAL_CPR_DIR/predictor.py
import os
import sys
import logging
import numpy as np
import pickle as pkl
from typing import Dict, Any, Tuple

logging.disable(logging.WARNING)

# google cloud
from google.cloud.aiplatform.prediction.predictor import Predictor
from google.cloud.aiplatform.utils import prediction_utils
from google.cloud import storage

# tensorflow
import tensorflow as tf
import tf_agents
from tf_agents.policies import py_tf_eager_policy
from tf_agents.trajectories import time_step as ts

# this repo
sys.path.extend([f'./{name}' for name in os.listdir(".") if os.path.isdir(name)])

from user_code import pred_config as pred_config
from user_code import emb_features_pred as emb_features
from user_code import reward_factory as reward_factory

os.environ["PROJECT_ID"] = pred_config.PROJECT_ID

# ==================================
# get trajectory step for prediction
# ==================================
def _get_pred_step(feature, reward_np):
    
    infer_step = ts.TimeStep(
        tf.constant(ts.StepType.FIRST, dtype=tf.int32, shape=[],name='step_type'),
        tf.constant(reward_np, dtype=tf.float32, shape=[], name='reward'),
        tf.constant(1.0, dtype=tf.float32, shape=[], name='discount'),
        feature
    )
    
    return infer_step

# ==================================
# prediction logic
# ==================================
class BanditPolicyPredictor(Predictor):
    
    """
    Interface of the Predictor class for Custom Prediction Routines.
    
    The Predictor is responsible for the ML logic for processing a prediction request.
    
    Specifically, the Predictor must define:
        (1) How to load all model artifacts used during prediction into memory.
        (2) The logic that should be executed at predict time.
    
    When using the default PredictionHandler, the Predictor will be invoked as follows:
    
      predictor.postprocess(predictor.predict(predictor.preprocess(prediction_input)))
    
    """
    
    def __init__(self):
        
        self._local_vocab_filename = "./vocab_dict.pkl"
        self._num_oov_buckets = pred_config.NUM_OOV_BUCKETS
        self._global_embedding_size = pred_config.GLOBAL_EMBEDDING_SIZE
        self._mv_embedding_size = pred_config.MV_EMBEDDING_SIZE
        self.max_genre_length = pred_config.MAX_GENRE_LENGTH
        return
        
    def load(self, artifacts_uri: str):
        """
        Loads trained policy dir & vocabulary
        Args:
            artifacts_uri (str):
                Required. The value of the environment variable AIP_STORAGE_URI.
                has `artifacts/` as a sub directory 
        
        """
        prediction_utils.download_model_artifacts(artifacts_uri)
        
        # init deploy policy
        self._deployment_policy = py_tf_eager_policy.SavedModelPyTFEagerPolicy(
            'artifacts', load_specs_from_pbtxt=True
        )
        
        # load vocab dict
        filehandler = open(f"{self._local_vocab_filename}", 'rb')
        self._vocab_dict = pkl.load(filehandler)
        filehandler.close()
        
        # only if no custom preprocessor is defined
        # self._preprocessor = preprocessor
        
    def preprocess(self, prediction_input: Dict): # -> Tuple[Dict, float]:
        """
        Args:
            prediction_input (Any):
                Required. The prediction input that needs to be preprocessed.
        Returns:
            The preprocessed prediction input.        
        """
        # inputs = super().preprocess(prediction_input)
        
        dummy_arm = tf.zeros([1, pred_config.PER_ARM_DIM], dtype=tf.float32)
        
        batch_size = len(prediction_input) #["instances"])
        assert batch_size == 1, 'prediction batch_size must be == 1'
        
        self._embs = emb_features.EmbeddingModel(
            vocab_dict = self._vocab_dict,
            num_oov_buckets = self._num_oov_buckets,
            global_emb_size = self._global_embedding_size,
            mv_emb_size = self._mv_embedding_size,
            max_genre_length = self.max_genre_length
        )
        
        # preprocess example
        rebuild_ex = {}

        for x in prediction_input: #["instances"]:
            rebuild_ex['target_movie_id'] = tf.constant([x["target_movie_id"]], dtype=tf.string)
            rebuild_ex['target_movie_rating'] = tf.constant([x["target_movie_rating"]], dtype=tf.float32)
            rebuild_ex['target_rating_timestamp'] = tf.constant([x["target_rating_timestamp"]], dtype=tf.int64)
            rebuild_ex['target_movie_genres'] = tf.constant([x["target_movie_genres"]], dtype=tf.string)
            rebuild_ex['target_movie_year'] = tf.constant([x["target_movie_year"]], dtype=tf.int64)
            rebuild_ex['target_movie_title'] = tf.constant([x["target_movie_title"]], dtype=tf.string)
            rebuild_ex['user_id'] = tf.constant([x["user_id"]], dtype=tf.string)
            rebuild_ex['user_gender'] = tf.constant([x["user_gender"]], dtype=tf.string)
            rebuild_ex['user_age'] = tf.constant([x["user_age"]], dtype=tf.int64)
            rebuild_ex['user_occupation_text'] = tf.constant([x["user_occupation_text"]], dtype=tf.string)
            rebuild_ex['user_zip_code'] = tf.constant([x["user_zip_code"]], dtype=tf.string)
        
        global_feat_infer = self._embs._get_global_context_features(rebuild_ex)
        logging.info(f'global_feat_infer: {global_feat_infer}')          # tmp - debugging
        
        arm_feat_infer = self._embs._get_per_arm_features(rebuild_ex)    # tmp - debugging
        logging.info(f'arm_feat_infer: {arm_feat_infer}')
    
        rewards = reward_factory._get_rewards(rebuild_ex)
        logging.info(f'rewards: {rewards}')                              # tmp - debugging
        
        actual_reward = rewards.numpy()[0]
        logging.info(f'actual_reward: {actual_reward}')                  # tmp - debugging
        
        arm_feat_infer = tf.reshape(arm_feat_infer, [1, pred_config.PER_ARM_DIM])
        concat_arm = tf.concat([arm_feat_infer, dummy_arm], axis=0)      # tmp - debugging
        
        # flatten global
        flat_global_infer = tf.reshape(global_feat_infer, [pred_config.GLOBAL_DIM])
        feature = {'global': flat_global_infer, 'per_arm': concat_arm}
        logging.info(f'feature: {feature}')                              # tmp - debugging
        
        trajectory_step = _get_pred_step(feature, actual_reward)
        logging.info(f'trajectory_step: {trajectory_step}')
        
        # prediction = self._deployment_policy.action(trajectory_step)
        
        return trajectory_step
    
    def predict(self, instances) -> Dict:
        """
        Performs prediction i.e., policy takes action
        """
        # prediction = self._deployment_policy.action(instances) # trajectory_step
        # return {"predictions": prediction}
        return self._deployment_policy.action(instances)
        

    def postprocess(self, prediction_results: Any) -> Any:
        """ 
        Postprocesses the prediction results
        
        TODO:
             Convert predictions to item IDs
             
        """
        processed_pred_dict = {
            "bandit_policy_type" : int(prediction_results.info.bandit_policy_type[0]),
            "chosen_arm_features" : prediction_results.info.chosen_arm_features.tolist(),
            "predicted_rewards_mean" : prediction_results.info.predicted_rewards_mean.tolist(),
            "action" : int(prediction_results.action.tolist()),
        }
        
        return processed_pred_dict

Writing cpr_dir/predictor.py


## Entrypoint / Handler

Custom containers require an **entrypoint** of the image that starts the model server
* With Custom Prediction Routines (CPR), you **don't need to write the entrypoint** anymore. Vertex SDK will populate the entrypoint with the custom predictor you provide
* However, we *can* implement a custom `handler()` method for the CPR model server, instead of using a pre-built http request handler. 
  * The `handler()` method handles the extraction of the prediction request from the HTTP request message
  * Will also, call the `predictor()` method to pass the extraction instances data for the prediction request
  
For implementing our own Docker build process, see "Scenario 4" in [getting started with cpr](https://github.com/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/ml_ops/stage6/get_started_with_cpr.ipynb) notebook tutorial

In [24]:
!pwd

/home/jupyter/tf_vertex_agents/src


In [25]:
%%writefile $LOCAL_CPR_DIR/handler.py

import json
import logging
from fastapi import Response
from google.cloud.aiplatform.prediction.handler import PredictionHandler

class CprHandler(PredictionHandler):
    """
    Default prediction handler for the pred requests sent to the application
    """

    async def handle(self, request):
        """Handles a prediction request."""
        
        request_body = await request.body()
        logging.info(f'request_body: {request_body}')
        
        request_body_dict = json.loads(request_body)
        logging.info(f'request_body_dict: {request_body_dict}')
        
        instances=request_body_dict["instances"]
        logging.info(f'instances: {instances}')
        
        prediction_results = self._predictor.postprocess(
            self._predictor.predict(
                self._predictor.preprocess(instances)
            )
        )
                                                         
        logging.info(f'prediction: {prediction_results}')

        return Response(content=json.dumps(prediction_results))

Writing cpr_dir/handler.py


## CPR package

### data config

> TODO - edit these as needed

In [27]:
PER_ARM_DIM           = 64
GLOBAL_DIM            = 72
NUM_OOV_BUCKETS       = 1
GLOBAL_EMBEDDING_SIZE = 12
MV_EMBEDDING_SIZE     = 16

In [28]:
pred_config = f"""
PROJECT_ID            = "{PROJECT_ID}"
REGION                = "{REGION}"
PREFIX                = "{PREFIX}"
BUCKET_NAME           = "{BUCKET_NAME}"
EXAMPLE_GEN_GCS_PATH  = "{EXAMPLE_GEN_GCS_PATH}"
PER_ARM_DIM           = {PER_ARM_DIM}
GLOBAL_DIM            = {GLOBAL_DIM}
NUM_OOV_BUCKETS       = {NUM_OOV_BUCKETS}
GLOBAL_EMBEDDING_SIZE = {GLOBAL_EMBEDDING_SIZE}
MV_EMBEDDING_SIZE     = {MV_EMBEDDING_SIZE}
MAX_GENRE_LENGTH      = {data_config.MAX_GENRE_LENGTH}
"""
print(pred_config)


PROJECT_ID            = "hybrid-vertex"
REGION                = "us-central1"
PREFIX                = "rec-bandits-v2"
BUCKET_NAME           = "rec-bandits-v2-hybrid-vertex-bucket"
EXAMPLE_GEN_GCS_PATH  = "data/movielens/m1m"
PER_ARM_DIM           = 64
GLOBAL_DIM            = 72
NUM_OOV_BUCKETS       = 1
GLOBAL_EMBEDDING_SIZE = 12
MV_EMBEDDING_SIZE     = 16
MAX_GENRE_LENGTH      = 10



In [29]:
LOCAL_PRED_CONFIG_FILE = f"{LOCAL_CPR_DIR}/{CPR_SUBDIR}/pred_config.py"

with open(LOCAL_PRED_CONFIG_FILE, 'w') as f:
    f.write(pred_config)

### requirements.txt

In [31]:
import tf_agents

print(f"tensorflow version    : {tf.__version__}")
print(f"tf_agents version     : {tf_agents.__version__}")
print(f"vertex_ai SDK version : {vertex_ai.__version__}")

tensorflow version    : 2.13.0
tf_agents version     : 0.17.0
vertex_ai SDK version : 1.33.1


In [32]:
%%writefile $LOCAL_CPR_DIR/requirements.txt
google-cloud-aiplatform[prediction]==1.46.0
google-cloud-storage
numpy
six
typing-extensions
tensorflow==2.13.0
tf-agents==0.17.0
urllib3
pillow
tensorflow-io
tensorflow-datasets
tensorflow-probability
fastapi

Writing cpr_dir/requirements.txt


In [33]:
!pwd

/home/jupyter/tf_vertex_agents/src


### copy remaining files to CPR dir

In [42]:
## TODO - fix import issue for two lines below

! cp ./utils/reward_factory.py ./$LOCAL_CPR_DIR/$CPR_SUBDIR/reward_factory.py
! cp ./networks/encoding_network.py ./$LOCAL_CPR_DIR/$CPR_SUBDIR/emb_features_pred.py
! gsutil cp $EXISTING_VOCAB_FILE ./$LOCAL_CPR_DIR/vocab_dict.pkl

Copying gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/vocabs/vocab_dict.pkl...
/ [1 files][205.2 KiB/205.2 KiB]                                                
Operation completed over 1 objects/205.2 KiB.                                    


In [44]:
!tree $LOCAL_CPR_DIR

[01;34mcpr_dir[00m
├── handler.py
├── predictor.py
├── requirements.txt
├── [01;34muser_code[00m
│   ├── emb_features_pred.py
│   ├── pred_config.py
│   └── reward_factory.py
└── vocab_dict.pkl

1 directory, 7 files


# Build and push CPR container to Vertex

* `LocalModel` [src](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/prediction/local_model.py)

**Build container**
* To build a custom container, we also need to write an entrypoint of the image that starts the model server. 
* However, with the Custom Prediction Routine feature, you don't need to write the entrypoint anymore. 
* Vertex AI SDK will populate the entrypoint with the custom predictor you provide.

### References

**build_cpr_model**
```
    local_model = LocalModel.build_cpr_model(
        "./user_src_dir",
        "us-docker.pkg.dev/$PROJECT/$REPOSITORY/$IMAGE_NAME$",
        predictor=$CUSTOM_PREDICTOR_CLASS,
        requirements_path="./user_src_dir/requirements.txt",
        extra_packages=["./user_src_dir/user_code/custom_package.tar.gz"],
    )
```

```
Args:
    src_dir (str):
        Required. The path to the local directory including all needed files such as
        predictor. The whole directory will be copied to the image.
    output_image_uri (str):
        Required. The image uri of the built image.
    predictor (Type[Predictor]):
        Optional. The custom predictor class consumed by handler to do prediction.
    handler (Type[Handler]):
        Required. The handler class to handle requests in the model server.
    base_image (str):
        Required. The base image used to build the custom images. The base image must
        have python and pip installed where the two commands ``python`` and ``pip`` must be
        available.
    requirements_path (str):
        Optional. The path to the local requirements.txt file. This file will be copied
        to the image and the needed packages listed in it will be installed.
    extra_packages (List[str]):
        Optional. The list of user custom dependency packages to install.
    no_cache (bool):
        Required. Do not use cache when building the image. Using build cache usually
        reduces the image building time. See
        https://docs.docker.com/develop/develop-images/dockerfile_best-practices/#leverage-build-cache
        for more details.
        
Returns:
    local model: Instantiated representation of the local model.
```

## Create example prediction instance

Create two formats:
* json file
* serialized dictionary

In [45]:
import json
import requests

In [46]:
TEST_INSTANCE = {
    "instances": [
        {
            'target_movie_genres': ['Drama', 'UNK', 'UNK', 'UNK', 'UNK', 'UNK', 'UNK', 'UNK', 'UNK', 'UNK'],
            'target_movie_id': '1775',
            'target_movie_rating': 4.0,
            'target_movie_title': 'Live Flesh (1997)',
            'target_movie_year': 1997,
            'target_rating_timestamp': 974612615,
            'user_age': 50,
            'user_gender': 'M',
            'user_id': '2173',
            'user_occupation_text': 'programmer',
            'user_zip_code': '87505',
        }
    ]
}

json_instance = json.dumps({"instances": TEST_INSTANCE['instances']})

print(json.dumps({"instances": TEST_INSTANCE['instances']}, indent=4))

{
    "instances": [
        {
            "target_movie_genres": [
                "Drama",
                "UNK",
                "UNK",
                "UNK",
                "UNK",
                "UNK",
                "UNK",
                "UNK",
                "UNK",
                "UNK"
            ],
            "target_movie_id": "1775",
            "target_movie_rating": 4.0,
            "target_movie_title": "Live Flesh (1997)",
            "target_movie_year": 1997,
            "target_rating_timestamp": 974612615,
            "user_age": 50,
            "user_gender": "M",
            "user_id": "2173",
            "user_occupation_text": "programmer",
            "user_zip_code": "87505"
        }
    ]
}


In [47]:
INPUT_FILE = "instances.json"

with open(INPUT_FILE, "w") as f:
    json_dumps_str = json.dumps(TEST_INSTANCE)
    f.write(json_dumps_str)

## Local build

In [48]:
!pwd

/home/jupyter/tf_vertex_agents/src


In [49]:
!ls $LOCAL_CPR_DIR

handler.py  predictor.py  requirements.txt  user_code  vocab_dict.pkl


In [50]:
from google.cloud.aiplatform.prediction import LocalModel
from cpr_dir.predictor import BanditPolicyPredictor
from cpr_dir.handler import CprHandler

# POLICY_SERVE_DIR_URI = f"{BASE_OUTPUT_URI}/policy-server"

print(f"POLICY_SERVE_DIR_URI   = {POLICY_SERVE_DIR_URI}")
print(f"REPOSITORY             = {REPOSITORY}")
print(f"IMAGE_NAME_02_PRED_CPR = {IMAGE_NAME_02_PRED_CPR}")
print(f"IMAGE_URI_02_PRED_CPR  = {IMAGE_URI_02_PRED_CPR}")
print(f"REMOTE_IMAGE_NAME_CPR  = {REMOTE_IMAGE_NAME_CPR}")

POLICY_SERVE_DIR_URI   = gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/policy-server
REPOSITORY             = rl-movielens-rec-bandits-v2
IMAGE_NAME_02_PRED_CPR = cpr-perarm-bandit-02e
IMAGE_URI_02_PRED_CPR  = gcr.io/hybrid-vertex/cpr-perarm-bandit-02e
REMOTE_IMAGE_NAME_CPR  = us-central1-docker.pkg.dev/hybrid-vertex/rl-movielens-rec-bandits-v2/cpr-perarm-bandit-02e


In [51]:
! gsutil ls $ARTIFACTS_DIR

gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/fingerprint.pb
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/policy_specs.pbtxt
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/saved_model.pb
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/assets/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/artifacts/variables/


In [52]:
local_model = LocalModel.build_cpr_model(
    src_dir= f"./{LOCAL_CPR_DIR}",
    output_image_uri = REMOTE_IMAGE_NAME_CPR,
    predictor= BanditPolicyPredictor,
    handler= CprHandler,
    base_image = 'tensorflow/tensorflow:2.13.0',
    requirements_path=f"./{LOCAL_CPR_DIR}/requirements.txt",
    no_cache=True,
)

You can check out the serving container spec of the built image.

In [53]:
local_model.get_serving_container_spec()

image_uri: "us-central1-docker.pkg.dev/hybrid-vertex/rl-movielens-rec-bandits-v2/cpr-perarm-bandit-02e"
predict_route: "/predict"
health_route: "/health"

Once CPR model built, either (1) test it locally or (2) push image to registry and upload model to Vertex

### (Optional) deploy to local endpoint

> **Deploy `LocalModel` to `LocalEndpoint`**

This cuts the dev cycle iterations significantly!!

In [54]:
!pwd

/home/jupyter/tf_vertex_agents/src


In [55]:
local_endpoint = local_model.deploy_to_local_endpoint(
    artifact_uri=f"{POLICY_SERVE_DIR_URI}",
    credential_path=CREDENTIALS_FILE,
    container_ready_timeout=300,
    container_ready_check_interval=10
)

**Call `serve()` to start the conatiner for local traffic** 

In [56]:
local_endpoint.serve()

health_check_response = local_endpoint.run_health_check()

print(f"health_check     : {health_check_response.content}")
print(f"container_status : {local_endpoint.get_container_status()}")
print(f"container_port   : {local_endpoint.container_port}")
print(f"env_vars         : {local_endpoint.serving_container_environment_variables}")
print(f"ready_interval   : {local_endpoint.container_ready_check_interval}")

health_check     : b'{}'
container_status : running
container_port   : 8080
env_vars         : {}
ready_interval   : 10


In [57]:
# TODO: still don't understand how to use this
local_endpoint.print_container_logs()

#### Test locally deployed policy endpoint

In [58]:
predict_response = local_endpoint.predict(
    request_file=INPUT_FILE,
    headers={"Content-Type": "application/json"},
)
print(f"predict_response: {predict_response.content}")

predict_response: b'{"bandit_policy_type": 1, "chosen_arm_features": [-0.0397552028298378, -0.03813551738858223, 0.044915106147527695, 0.02706361934542656, 0.009737588465213776, -0.019834626466035843, 0.04776227846741676, 0.013513337820768356, 0.049553122371435165, 0.017252493649721146, -0.01818246766924858, -0.004529118537902832, 0.021304253488779068, 0.017105866223573685, -0.012950398027896881, -0.020180154591798782, 0.03597677871584892, -0.022177668288350105, 0.02848290465772152, -0.018231380730867386, -0.021156037226319313, 0.0037916938308626413, 0.02093610353767872, -0.039079517126083374, 0.017280887812376022, 0.04565785080194473, 0.035186875611543655, 0.009450845420360565, 0.02285950817167759, 0.001532208058051765, -0.0380002036690712, -0.04490354657173157, 0.008885107934474945, -0.012029051780700684, -0.02177565172314644, 0.038999903947114944, -0.00711219385266304, 0.016311775892972946, -0.04275301843881607, -0.008268356323242188, 0.021482016891241074, -0.032657526433467865, -0.

and to get prediction response as a usable object: `.json()`

In [59]:
preds = predict_response.json()

print(preds['chosen_arm_features'])

[-0.0397552028298378, -0.03813551738858223, 0.044915106147527695, 0.02706361934542656, 0.009737588465213776, -0.019834626466035843, 0.04776227846741676, 0.013513337820768356, 0.049553122371435165, 0.017252493649721146, -0.01818246766924858, -0.004529118537902832, 0.021304253488779068, 0.017105866223573685, -0.012950398027896881, -0.020180154591798782, 0.03597677871584892, -0.022177668288350105, 0.02848290465772152, -0.018231380730867386, -0.021156037226319313, 0.0037916938308626413, 0.02093610353767872, -0.039079517126083374, 0.017280887812376022, 0.04565785080194473, 0.035186875611543655, 0.009450845420360565, 0.02285950817167759, 0.001532208058051765, -0.0380002036690712, -0.04490354657173157, 0.008885107934474945, -0.012029051780700684, -0.02177565172314644, 0.038999903947114944, -0.00711219385266304, 0.016311775892972946, -0.04275301843881607, -0.008268356323242188, 0.021482016891241074, -0.032657526433467865, -0.02329789474606514, -0.009759318083524704, 0.027740132063627243, 0.029

stop local endpoint container:

In [60]:
local_endpoint.stop()

## Deploy to Vertex AI

**Push image to registry**

In [61]:
local_model.push_image()

**Upload to Vertex Model Registry**

In [1]:
VERSION = "v3-cpu"

In [63]:
!gsutil ls $POLICY_SERVE_DIR_URI

gs://rec-bandits-v2-hybrid-vertex-bucket/02-supervised-bandits-v1/run-20240313-192115/policy-server/artifacts/


In [64]:
uploaded_policy = vertex_ai.Model.upload(
    local_model=local_model,
    display_name=f'cpr-bandit-{VERSION}',
    artifact_uri=POLICY_SERVE_DIR_URI,
    sync=True,
    serving_container_environment_variables={
        "VERTEX_CPR_MAX_WORKERS": 4,
        "VERTEX_CPR_WEB_CONCURRENCY": 4
    }
)

print(f"display_name    : {uploaded_policy.display_name}")
print(f"uploaded_policy : {uploaded_policy}")

display_name    : cpr-bandit-v3-cpu
uploaded_policy : <google.cloud.aiplatform.models.Model object at 0x7f0cec1b41f0> 
resource name: projects/934903580331/locations/us-central1/models/3886713131048632320


In [65]:
endpoint = vertex_ai.Endpoint.create(
    display_name=f'endpoint-cpr-bandit-{VERSION}',
    project=PROJECT_ID,
    location=LOCATION,
    sync=True,
)

print(f"display_name : {endpoint.display_name}")
print(f"endpoint     : {endpoint}")

display_name : endpoint-cpr-bandit-v3-cpu
endpoint     : <google.cloud.aiplatform.models.Endpoint object at 0x7f0cec190940> 
resource name: projects/934903580331/locations/us-central1/endpoints/7382884129957740544


In [66]:
deployed_policy = uploaded_policy.deploy(
    endpoint=endpoint,
    deployed_model_display_name=f'deployed-cpr-bandit-{VERSION}',
    machine_type="n1-standard-32",
    min_replica_count=1,
    max_replica_count=2,
    accelerator_type=None,
    accelerator_count=0,
    sync=True,
    enable_access_logging=True,
)

print(f"display_name    : {deployed_policy.display_name}\n")
print(f"deployed_policy : {deployed_policy}")

display_name    : endpoint-cpr-bandit-v3-cpu

deployed_policy : <google.cloud.aiplatform.models.Endpoint object at 0x7f0cec190940> 
resource name: projects/934903580331/locations/us-central1/endpoints/7382884129957740544


### Test deployed policy endpoint

*Note*: to have predictions display in response to the gcloud command, the handler should return a response dictionary like:

> `{"predictions": post_processed_preds}`

See [Send an online prediction request](https://cloud.google.com/vertex-ai/docs/predictions/get-online-predictions#predict-request) in docs for more details 

#### gcloud

In [73]:
ENDPOINT_ID = endpoint.resource_name

!gcloud ai endpoints predict $ENDPOINT_ID --region=$REGION --json-request=instances.json

Using endpoint [https://us-central1-prediction-aiplatform.googleapis.com/]
deployedModelId: '6544044718894874624'
model: projects/934903580331/locations/us-central1/models/3886713131048632320
modelDisplayName: cpr-bandit-v3-cpu
modelVersionId: '1'


In [74]:
# json_instance
ENCODED_TEST_INSTANCE = json_instance.encode('utf-8')
ENCODED_TEST_INSTANCE

b'{"instances": [{"target_movie_genres": ["Drama", "UNK", "UNK", "UNK", "UNK", "UNK", "UNK", "UNK", "UNK", "UNK"], "target_movie_id": "1775", "target_movie_rating": 4.0, "target_movie_title": "Live Flesh (1997)", "target_movie_year": 1997, "target_rating_timestamp": 974612615, "user_age": 50, "user_gender": "M", "user_id": "2173", "user_occupation_text": "programmer", "user_zip_code": "87505"}]}'

#### Vertex SDK's raw predict

In [75]:
response = deployed_policy.raw_predict(
    body = ENCODED_TEST_INSTANCE,
    headers = {'Content-Type':'application/json'}
).json()

# print(response['chosen_arm_features'])
print(response)

{'bandit_policy_type': 1, 'chosen_arm_features': [-0.03418135643005371, 0.014238927513360977, -0.035376884043216705, -0.030898524448275566, 0.029633570462465286, 0.04977237060666084, 0.002097915858030319, 0.049689147621393204, 0.04013046994805336, -0.03785369545221329, -0.023264408111572266, -0.028406275436282158, 0.016198862344026566, 0.031409528106451035, -0.0464482307434082, -0.019686413928866386, 0.027092022821307182, 0.032301608473062515, -0.0012740996899083257, 0.02169101871550083, -0.039971426129341125, -0.009036685340106487, 0.011448188684880733, -0.03226463869214058, 0.03542361408472061, 0.03470868989825249, -0.029757792130112648, -0.03481779247522354, -0.013962672092020512, -0.015002253465354443, 0.01682540401816368, -0.02901960350573063, 0.033026840537786484, 0.019611980766057968, -0.013973880559206009, 0.008736848831176758, -0.007301889359951019, 0.04800843074917793, -0.03088761679828167, -0.009026013314723969, 0.00881868600845337, -0.008199773728847504, 0.01789453253149986

In [76]:
deployed_policy.gca_resource

name: "projects/934903580331/locations/us-central1/endpoints/7382884129957740544"
display_name: "endpoint-cpr-bandit-v3-cpu"
deployed_models {
  id: "6544044718894874624"
  model: "projects/934903580331/locations/us-central1/models/3886713131048632320"
  display_name: "deployed-cpr-bandit-v3-cpu"
  create_time {
    seconds: 1712343516
    nanos: 348343000
  }
  dedicated_resources {
    machine_spec {
      machine_type: "n1-standard-32"
    }
    min_replica_count: 1
    max_replica_count: 2
  }
  enable_access_logging: true
  model_version_id: "1"
}
traffic_split {
  key: "6544044718894874624"
  value: 100
}
etag: "AMEw9yPul3Qy_PxnNOAgH550btafWqyXRrF_j0iuHmse0fSKYDvwVZ8QFbcNUp4ee7ID"
create_time {
  seconds: 1712343509
  nanos: 298718000
}
update_time {
  seconds: 1712343896
  nanos: 632132000
}

In [77]:
deployed_policy.to_dict()

{'name': 'projects/934903580331/locations/us-central1/endpoints/7382884129957740544',
 'displayName': 'endpoint-cpr-bandit-v3-cpu',
 'deployedModels': [{'id': '6544044718894874624',
   'model': 'projects/934903580331/locations/us-central1/models/3886713131048632320',
   'displayName': 'deployed-cpr-bandit-v3-cpu',
   'createTime': '2024-04-05T18:58:36.348343Z',
   'dedicatedResources': {'machineSpec': {'machineType': 'n1-standard-32'},
    'minReplicaCount': 1,
    'maxReplicaCount': 2},
   'enableAccessLogging': True,
   'modelVersionId': '1'}],
 'trafficSplit': {'6544044718894874624': 100},
 'etag': 'AMEw9yPul3Qy_PxnNOAgH550btafWqyXRrF_j0iuHmse0fSKYDvwVZ8QFbcNUp4ee7ID',
 'createTime': '2024-04-05T18:58:29.298718Z',
 'updateTime': '2024-04-05T19:04:56.632132Z'}

# Clean up

**in terminal shell, run:**

* `docker image ls`
* `docker system df`
* `docker system prune`

If you use the `-f` flag and specify the image's short or long ID, then this command untags and removes all images that match the specified ID.

These aliases are equivalent:
* `docker image rm`
* `docker image remove`
* `docker rmi`

> `docker rmi gcr.io/hybrid-vertex/pred-perarm-feats-02e:latest`

Undeploy model and delete endpoint

In [107]:
# endpoint.delete(force=True)

Delete policy uploaded to Vertex AI Registry

In [108]:
# uploaded_policy.delete()

**Finished**