# Scale Matrix-factorization-based simulation for Contextual Bandits with Vertex AI Training

## Overview

### Notebook Objectives:
* Create hyperparameter tuning and training custom container
* Submit hyperparameter tuning job (optional)
* Create custom prediction container
* Submit custom container training job
* Deploy trained model to Endpoint
* Predict on the Endpoint

**TODO** - fix vars -Create hyperparameter tuning and training custom container

Create a custom container that can be used for both hyperparameter tuning and training. The associated source code is in `src/per_arm_rl/`. This serves as the inner script of the custom container.
As before, the training function is the same as [trainer.train](https://github.com/tensorflow/agents/blob/r0.8.0/tf_agents/bandits/agents/examples/v2/trainer.py#L104), but it keeps track of intermediate metric values, supports hyperparameter tuning, and (for training) saves artifacts to different locations. The training logic for hyperparameter tuning and training is the same.

**Execute hyperparameter tuning:**
* The code does not save model artifacts. It takes in command-line arguments as hyperparameter values from the Vertex AI Hyperparameter Tuning service, and reports training result metric to Vertex AI at each trial using cloudml-hypertune.
* Note that if you decide to save model artifacts, saving them to the same directory may cause overwriting errors if you use parallel trials in the hyperparameter tuning job. The recommended approach is to save each trial's artifacts to a different sub-directory. This would also allow you to recover all the artifacts from different trials and can potentially save you from re-training.
* Read more about hyperparameter tuning for custom containers [here](https://cloud.google.com/vertex-ai/docs/training/containers-overview#hyperparameter_tuning_with_custom_containers); read about hyperparameter tuning support [here](https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview).

**Execute training:**
* The code saves model artifacts to `os.environ["AIP_MODEL_DIR"]` in addition to `ARTIFACTS_DIR`, as required [here](https://github.com/googleapis/python-aiplatform/blob/v0.8.0/google/cloud/aiplatform/training_jobs.py#L2202).
* If you want to make changes to the function, make sure to still save the trained policy as a SavedModel to clean directories
* avoid saving checkpoints and other artifacts, so that deploying the model to endpoints works.

## Notebook setup

### set vars

In [1]:
# PREFIX = 'mabv1'
VERSION        = "v2"                       # TODO
PREFIX         = f'rec-bandits-{VERSION}'   # TODO

print(f"PREFIX: {PREFIX}")

PREFIX: rec-bandits-v2


**run the next cell to populate env vars**

In [2]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-bucket'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"

VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "rec-bandits-v2"
VERSION                  = "v2"

BUCKET_NAME              = "rec-bandits-v2-hybrid-vertex-bucket"
BUCKET_URI               = "gs://rec-bandits-v2-hybrid-vertex-bucket"
DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://rec-bandits-v2-hybrid-vertex-bucket/data"
VOCAB_SUBDIR             = "vocabs"
VOCAB_FILENAME           = "vocab_dict.pkl"
DATA_PATH_KFP_DEMO       = "gs://rec-bandits-v2-hybrid-vertex-bucket/data/kfp_demo_data/u.data"

VPC_NETWORK_FULL         = "projects/934903580331/global/networks/ucaip-haystack-vpc-network"

BIGQUERY_DATASET_NAME    = "mvlens_rec_bandits_v2"
BIGQUERY_TABLE_NA

### imports

In [3]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

In [4]:
import functools
import json
from collections import defaultdict
from typing import Callable, Dict, List, Optional, TypeVar
from datetime import datetime
import time

import logging
logging.disable(logging.WARNING)

import matplotlib.pyplot as plt
import numpy as np

# google cloud
from google.cloud import aiplatform, storage
from google.cloud.aiplatform import hyperparameter_tuning as hpt

# tensorflow
import tensorflow as tf
from tf_agents.agents import TFAgent
from tf_agents.bandits.agents import lin_ucb_agent
from tf_agents.bandits.agents.examples.v2 import trainer
from tf_agents.bandits.environments import (environment_utilities,
                                            movielens_py_environment)
from tf_agents.bandits.metrics import tf_metrics as tf_bandit_metrics
from tf_agents.drivers import dynamic_step_driver
from tf_agents.environments import TFEnvironment, tf_py_environment
from tf_agents.eval import metric_utils
from tf_agents.metrics import tf_metrics
from tf_agents.metrics.tf_metric import TFStepMetric
from tf_agents.policies import policy_saver

# GPU
from numba import cuda 
import gc

if tf.__version__[0] != "2":
    raise Exception("The trainer only runs with TensorFlow version 2.")

T = TypeVar("T")

In [6]:
device = cuda.get_current_device()
device.reset()
gc.collect()

14

In [7]:
import sys
sys.path.append("..")

# my project
from src.per_arm_rl import data_utils
from src.per_arm_rl import data_config

In [8]:
# cloud storage client
storage_client = storage.Client(project=PROJECT_ID)

# Vertex client
aiplatform.init(project=PROJECT_ID, location=LOCATION)

In [9]:
! gsutil ls $DATA_PATH

gs://rec-bandits-v2-hybrid-vertex-bucket/data/ml-ratings-100k-full.tfrecord
gs://rec-bandits-v2-hybrid-vertex-bucket/data/kfp_demo_data/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/listwise-3n-train/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/listwise-3n-val/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/listwise-5n-train/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/listwise-5n-val/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/train/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/val/


## Build train application

### Vertex Experiments

In [10]:
EXPERIMENT_NAME   = f'scale-my-mf-env-hpt-{PREFIX}'

invoke_time       = time.strftime("%Y%m%d-%H%M%S")
RUN_NAME          = f'run-{invoke_time}'

BASE_OUTPUT_DIR   = f'gs://{BUCKET_NAME}/{EXPERIMENT_NAME}/{RUN_NAME}'
LOG_DIR           = f"{BASE_OUTPUT_DIR}/logs"
ROOT_DIR          = f"{BASE_OUTPUT_DIR}/root"                               # Root directory for writing logs/summaries/checkpoints.
ARTIFACTS_DIR     = f"{BASE_OUTPUT_DIR}/artifacts"                          # Where the trained model will be saved and restored.

print(f"EXPERIMENT_NAME   : {EXPERIMENT_NAME}")
print(f"RUN_NAME          : {RUN_NAME}")
print(f"BASE_OUTPUT_DIR   : {BASE_OUTPUT_DIR}")
print(f"LOG_DIR           : {LOG_DIR}")
print(f"ROOT_DIR          : {ROOT_DIR}")
print(f"ARTIFACTS_DIR     : {ARTIFACTS_DIR}")

EXPERIMENT_NAME   : scale-my-mf-env-hpt-rec-bandits-v2
RUN_NAME          : run-20231114-144856
BASE_OUTPUT_DIR   : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-hpt-rec-bandits-v2/run-20231114-144856
LOG_DIR           : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-hpt-rec-bandits-v2/run-20231114-144856/logs
ROOT_DIR          : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-hpt-rec-bandits-v2/run-20231114-144856/root
ARTIFACTS_DIR     : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-hpt-rec-bandits-v2/run-20231114-144856/artifacts


## Prepare (hpt) training job for Vertex AI
* Submit a hyperparameter training job with the custom container. Read more details for using Python packages as an alternative to using custom containers in the example shown [here](https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#create)
* Define the hyperparameter(s), max trial count, parallel trial count, parameter search algorithm, machine spec, accelerators, worker pool, etc.

In [11]:
# Execute hyperparameter tuning instead of regular training
RUN_HYPERPARAMETER_TUNING          = True
TRAIN_WITH_BEST_HYPERPARAMETERS    = False  # Do not train.

# Directory to store the best hyperparameter(s) in `BUCKET_NAME` and locally (temporarily)
HPTUNING_RESULT_DIR                = "hptuning"
HPTUNING_RESULT_FILE               = "result.json"
HPTUNING_RESULT_PATH               = f"{EXPERIMENT_NAME}/{RUN_NAME}/{HPTUNING_RESULT_DIR}/{HPTUNING_RESULT_FILE}"
HPTUNING_RESULT_URI                = f"{BUCKET_URI}/{HPTUNING_RESULT_PATH}"

print(f"HPTUNING_RESULT_DIR  : {HPTUNING_RESULT_DIR}")
print(f"HPTUNING_RESULT_FILE : {HPTUNING_RESULT_FILE}")
print(f"HPTUNING_RESULT_PATH : {HPTUNING_RESULT_PATH}")
print(f"HPTUNING_RESULT_URI  : {HPTUNING_RESULT_URI}")

HPTUNING_RESULT_DIR  : hptuning
HPTUNING_RESULT_FILE : result.json
HPTUNING_RESULT_PATH : scale-my-mf-env-hpt-rec-bandits-v2/run-20231114-144856/hptuning/result.json
HPTUNING_RESULT_URI  : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-hpt-rec-bandits-v2/run-20231114-144856/hptuning/result.json


### Accelerators

In [12]:
ACCELERATOR = "t4" # str: "a100" | "t4" | None | l4
ACCELERATOR = str(ACCELERATOR)
print(f"ACCELERATOR: {ACCELERATOR}")

ACCELERATOR: t4


In [13]:
if ACCELERATOR == "a100":
    WORKER_MACHINE_TYPE = 'a2-highgpu-1g'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_TESLA_A100'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
    DISTRIBUTE_STRATEGY = 'single'
elif ACCELERATOR == 't4':
    WORKER_MACHINE_TYPE = 'n1-highcpu-16'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_TESLA_T4'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
elif ACCELERATOR == 'l4':
    WORKER_MACHINE_TYPE = "g2-standard-16"
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_L4'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
elif ACCELERATOR == 'tpu':
    WORKER_MACHINE_TYPE = "cloud-tpu"
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'TPU_v3'
    PER_MACHINE_ACCELERATOR_COUNT = 8 # 8 | +32+ for TPU Pods
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = None
elif ACCELERATOR == "False":
    WORKER_MACHINE_TYPE = 'n2-highmem-32' # 'n1-highmem-96'n | 'n2-highmem-92'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = None
    PER_MACHINE_ACCELERATOR_COUNT = 0
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
    
TF_GPU_THREAD_COUNT   = '4'      # '1' | '4' | '8'

print(f"WORKER_MACHINE_TYPE            : {WORKER_MACHINE_TYPE}")
print(f"REPLICA_COUNT                  : {REPLICA_COUNT}")
print(f"ACCELERATOR_TYPE               : {ACCELERATOR_TYPE}")
print(f"PER_MACHINE_ACCELERATOR_COUNT  : {PER_MACHINE_ACCELERATOR_COUNT}")
print(f"DISTRIBUTE_STRATEGY            : {DISTRIBUTE_STRATEGY}")
print(f"REDUCTION_SERVER_COUNT         : {REDUCTION_SERVER_COUNT}")
print(f"REDUCTION_SERVER_MACHINE_TYPE  : {REDUCTION_SERVER_MACHINE_TYPE}")
print(f"TF_GPU_THREAD_COUNT            : {TF_GPU_THREAD_COUNT}")

WORKER_MACHINE_TYPE            : n1-highcpu-16
REPLICA_COUNT                  : 1
ACCELERATOR_TYPE               : NVIDIA_TESLA_T4
PER_MACHINE_ACCELERATOR_COUNT  : 1
DISTRIBUTE_STRATEGY            : single
REDUCTION_SERVER_COUNT         : 0
REDUCTION_SERVER_MACHINE_TYPE  : n1-highcpu-16
TF_GPU_THREAD_COUNT            : 4


### Create Tensorboard

In [14]:
# # create new TB instance
TENSORBOARD_DISPLAY_NAME=f"{EXPERIMENT_NAME}-{RUN_NAME}"

tensorboard = aiplatform.Tensorboard.create(
    display_name=TENSORBOARD_DISPLAY_NAME
    , project=PROJECT_ID
    , location=REGION
)

TB_RESOURCE_NAME = tensorboard.resource_name

print(f"TB_RESOURCE_NAME: {TB_RESOURCE_NAME}")
print(f"TB display name: {tensorboard.display_name}")

TB_RESOURCE_NAME: projects/934903580331/locations/us-central1/tensorboards/4213966274381742080
TB display name: scale-my-mf-env-hpt-rec-bandits-v2-run-20231114-144856


### Set training args

In [15]:
# Set hyperparameters.
BATCH_SIZE       = 128       # Training and prediction batch size.
TRAINING_LOOPS   = 200      # Number of training iterations.
STEPS_PER_LOOP   = 2       # Number of driver steps per training iteration.

# Set MovieLens simulation environment parameters.
RANK_K           = 20      # Rank for matrix factorization in the MovieLens environment; also the observation dimension.
NUM_ACTIONS      = 20      # Number of actions (movie items) to choose from.
PER_ARM          = True    # Use the non-per-arm version of the MovieLens environment.

# Set agent parameters.
TIKHONOV_WEIGHT  = 0.001   # LinUCB Tikhonov regularization weight.
AGENT_ALPHA      = 10.0    # LinUCB exploration parameter that multiplies the confidence intervals.

CHKPT_INTERVAL       = TRAINING_LOOPS - 1

print(f"BATCH_SIZE       : {BATCH_SIZE}")
print(f"TRAINING_LOOPS   : {TRAINING_LOOPS}")
print(f"STEPS_PER_LOOP   : {STEPS_PER_LOOP}")
print(f"RANK_K           : {RANK_K}")
print(f"NUM_ACTIONS      : {NUM_ACTIONS}")
print(f"PER_ARM          : {PER_ARM}")
print(f"TIKHONOV_WEIGHT  : {TIKHONOV_WEIGHT}")
print(f"AGENT_ALPHA      : {AGENT_ALPHA}")
print(f"CHKPT_INTERVAL   : {CHKPT_INTERVAL}")

BATCH_SIZE       : 128
TRAINING_LOOPS   : 200
STEPS_PER_LOOP   : 2
RANK_K           : 20
NUM_ACTIONS      : 20
PER_ARM          : True
TIKHONOV_WEIGHT  : 0.001
AGENT_ALPHA      : 10.0
CHKPT_INTERVAL   : 199


In [16]:
WORKER_ARGS = [
    f"--data-path={DATA_PATH}"               # TODO - remove duplicate arg
    , f"--bucket_name={BUCKET_NAME}"
    , f"--data_gcs_prefix={DATA_GCS_PREFIX}"
    , f"--data_path={DATA_PATH}"
    , f"--project_number={PROJECT_NUM}"
    , f"--batch-size={BATCH_SIZE}"
    , f"--rank-k={RANK_K}"
    , f"--num-actions={NUM_ACTIONS}"
    , f"--tikhonov-weight={TIKHONOV_WEIGHT}"
    , f"--agent-alpha={AGENT_ALPHA}"
    , f"--training_loops={TRAINING_LOOPS}"
    , f"--steps-per-loop={STEPS_PER_LOOP}"
    , f"--distribute={DISTRIBUTE_STRATEGY}"
    , f"--artifacts_dir={ARTIFACTS_DIR}"
    , f"--root_dir={ROOT_DIR}"
    , f"--log_dir={LOG_DIR}"
    , f"--experiment_name={EXPERIMENT_NAME}"
    , f"--experiment_run={RUN_NAME}"
    , f"--tf_gpu_thread_count={TF_GPU_THREAD_COUNT}"
    , f"--chkpt_interval={CHKPT_INTERVAL}"
    # , f"--profiler"
    , f"--sum_grads_vars"
    , f"--debug_summaries"
    , f"--use_gpu"
    # , f"--use_tpu"
]

if RUN_HYPERPARAMETER_TUNING:
    WORKER_ARGS.append("--run-hyperparameter-tuning")
    
elif TRAIN_WITH_BEST_HYPERPARAMETERS:
    WORKER_ARGS.append("--train-with-best-hyperparameters")
    
from src.per_arm_rl import train_utils

WORKER_POOL_SPECS = train_utils.prepare_worker_pool_specs(
    image_uri=f"{IMAGE_URI_01}:latest",
    args=WORKER_ARGS,
    replica_count=REPLICA_COUNT,
    machine_type=WORKER_MACHINE_TYPE,
    accelerator_count=PER_MACHINE_ACCELERATOR_COUNT,
    accelerator_type=ACCELERATOR_TYPE,
    reduction_server_count=REDUCTION_SERVER_COUNT,
    reduction_server_machine_type=REDUCTION_SERVER_MACHINE_TYPE,
)

from pprint import pprint
pprint(WORKER_POOL_SPECS)

[{'container_spec': {'args': ['--data-path=gs://rec-bandits-v2-hybrid-vertex-bucket/data',
                              '--bucket_name=rec-bandits-v2-hybrid-vertex-bucket',
                              '--data_gcs_prefix=data',
                              '--data_path=gs://rec-bandits-v2-hybrid-vertex-bucket/data',
                              '--project_number=934903580331',
                              '--batch-size=128',
                              '--rank-k=20',
                              '--num-actions=20',
                              '--tikhonov-weight=0.001',
                              '--agent-alpha=10.0',
                              '--training_loops=200',
                              '--steps-per-loop=2',
                              '--distribute=single',
                              '--artifacts_dir=gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-hpt-rec-bandits-v2/run-20231114-144856/artifacts',
                              '--root_dir=gs://r

### Define parameter spec

Next, define the 1parameter_spec1, which is a dictionary specifying the parameters you want to optimize. The **dictionary key** is the string you assigned to the command line argument for each hyperparameter, and the **dictionary value** is the parameter specification.

For each hyperparameter, you need to define the `Type` as well as the bounds for the values that the tuning service will try. Hyperparameters can be of type `Double`, `Integer`, `Categorical`, or `Discrete`. If you select the type `Double` or `Integer`, you need to provide a minimum and maximum value. And if you select `Categorical` or `Discrete` you need to provide the values. For the `Double` and `Integer` types, you also need to provide the scaling value. Learn more about [Using an Appropriate Scale](https://www.youtube.com/watch?v=cSoK_6Rkbfg).

In [17]:
# Dictionary representing parameters to optimize.
# The dictionary key is the parameter_id, which is passed into your training
# job as a command line argument,
# And the dictionary value is the parameter specification of the metric.

parameter_spec = {
    # "steps-per-loop": hpt.DiscreteParameterSpec(values=[2, 4], scale=None),
    "batch-size": hpt.DiscreteParameterSpec(values=[16, 32, 128], scale=None),
    "num-actions": hpt.DiscreteParameterSpec(values=[8, 24, 32], scale=None),
    # "training-loops": hpt.DiscreteParameterSpec(values=[4, 6, 8], scale=None),
}

The final spec to define is `metric_spec`, which is a dictionary representing the metric to optimize. The dictionary key is the `hyperparameter_metric_tag` that you set in your training application code, and the value is the optimization goal.

In [18]:
# Dictionary representing metrics to optimize.
# The dictionary key is the metric_id, which is reported by your training job,
# And the dictionary value is the optimization goal of the metric.
metric_spec = {"final_average_return": "maximize"}

## [1] Submit (hpt) train job

In [19]:
aiplatform.init(
    project=PROJECT_ID,
    location=REGION,
    experiment=EXPERIMENT_NAME,
    # staging_bucket=ROOT_DIR,
    # experiment_tensorboard=TB_RESOURCE_NAME,
)

JOB_NAME = f"01d-hpt-{RUN_NAME}"
print(f"JOB_NAME: {JOB_NAME}")

JOB_NAME: 01d-hpt-run-20231114-144856


In [20]:
# Create a CustomJob
my_custom_hpt_job = aiplatform.CustomJob(
    display_name=JOB_NAME
    , project=PROJECT_ID
    , worker_pool_specs=WORKER_POOL_SPECS
    , base_output_dir=BASE_OUTPUT_DIR
    , staging_bucket=ROOT_DIR
)

Then, create and run a HyperparameterTuningJob.

> see [source code](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/hyperparameter_tuning.py)

There are a few arguments to note:

* `max_trial_count`: Sets an upper bound on the number of trials the service will run. The recommended practice is to start with a smaller number of trials and get a sense of how impactful your chosen hyperparameters are before scaling up.

* `parallel_trial_count`: If you use parallel trials, the service provisions multiple training processing clusters. The worker pool spec that you specify when creating the job is used for each individual training cluster. Increasing the number of parallel trials reduces the amount of time the hyperparameter tuning job takes to run; however, it can reduce the effectiveness of the job overall. This is because the default tuning strategy uses results of previous trials to inform the assignment of values in subsequent trials.

* `search_algorithm`: The available search algorithms are grid, random, or default (None). The default option applies Bayesian optimization to search the space of possible hyperparameter values and is the recommended algorithm.

In [21]:
# Create and run HyperparameterTuningJob

hp_job = aiplatform.HyperparameterTuningJob(
    display_name=JOB_NAME,
    custom_job=my_custom_hpt_job,
    metric_spec=metric_spec,
    parameter_spec=parameter_spec,
    max_trial_count=6,
    parallel_trial_count=6,
    project=PROJECT_ID,
    search_algorithm="random",
)

hp_job.run(
    sync=False
    , service_account=VERTEX_SA
    , restart_job_on_worker_restart = False 
    , enable_web_access = True
    , tensorboard = TB_RESOURCE_NAME
)

In [22]:
print(f"Job Name: {hp_job.display_name}")
print(f"Job Resource Name: {hp_job.resource_name}\n")
# print(f"Check training progress at {custom_job._dashboard_uri()}")

Job Name: 01d-hpt-run-20231114-144856
Job Resource Name: projects/934903580331/locations/us-central1/hyperparameterTuningJobs/5939469248110788608



In [26]:
hpt_job_test = aiplatform.HyperparameterTuningJob.get(
    resource_name=hp_job.resource_name,
)
hpt_job_test

<google.cloud.aiplatform.jobs.HyperparameterTuningJob object at 0x7f4008c45300> 
resource name: projects/934903580331/locations/us-central1/hyperparameterTuningJobs/6469768104233664512

In [24]:
# hpt_job_test.error

### View TensorBoard for HPT job

<img src="imgs/01_hpt_tboard.png" 
     align="center" 
     width="850"
     height="850"/>

#### Find the best combination(s) hyperparameter(s) for each metric

In [23]:
best_test = (None, None, None, 0.0)
for trial in hp_job.trials:
    # print(trial)
    # Keep track of the best outcome
    if float(trial.final_measurement.metrics[0].value) > best_test[3]:
        # print(trial.final_measurement.metrics[0].value)
        # print(trial.parameters[0].value)
        try:
            best_test = (
                trial.id,
                trial.parameters[0].value, #.number_value,
                trial.parameters[1].value, #.number_value,
                trial.final_measurement.metrics[0].value,
            )
        except:
            best_test = (
                trial.id,
                trial.parameters[0].value.number_value,
                None,
                trial.final_measurement.metrics[0].value,
            )

print(best_test)

('6', 128.0, 8.0, 1.6136982440948486)


In [24]:
BATCH_SIZE_best = 128 #int(best_test[1])
NUM_ACTIONS_best = 24 #int(best_test[2])

BEST_HPT_DICT = {
    "batch-size":BATCH_SIZE_best,
    "num-actions":NUM_ACTIONS_best
}
print(f"BEST_HPT_DICT : {BEST_HPT_DICT}")

BEST_HPT_DICT : {'batch-size': 128, 'num-actions': 24}


#### Convert a combination of best hyperparameter(s) for a metric of interest to JSON

In [25]:
# HPTUNING_RESULT_DIR = "hptuning/"
# HPTUNING_RESULT_PATH = os.path.join(HPTUNING_RESULT_DIR, "result.json")

# print(f"HPTUNING_RESULT_PATH : {HPTUNING_RESULT_PATH}")

In [26]:
# ! rm -rf $HPTUNING_RESULT_DIR
# ! mkdir $HPTUNING_RESULT_DIR

In [27]:
LOCAL_RESULTS_FILE = "result.json"  # {"batch-size": 8.0, "steps-per-loop": 2.0}

# with open(LOCAL_RESULTS_FILE, "w") as f:
#     json.dump(best_params["final_average_return"][0], f)

with open(LOCAL_RESULTS_FILE, "w") as f:
    json_dumps_str = json.dumps(BEST_HPT_DICT)
    f.write(json_dumps_str)

#### Upload the best hyperparameter(s) to GCS for use in training

In [28]:
!gsutil -q cp $LOCAL_RESULTS_FILE $HPTUNING_RESULT_URI

!gsutil ls $HPTUNING_RESULT_URI

gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-hpt-rec-bandits-v2/run-20231114-144856/hptuning/result.json


## [2] Submit custom container training job

- Note again that the bucket must be in the same regional location as the service location and it should not be multi-regional.
- Read more of CustomContainerTrainingJob's source code [here](https://github.com/googleapis/python-aiplatform/blob/v0.8.0/google/cloud/aiplatform/training_jobs.py#L2153).
- Like with local execution, you can use TensorBoard Profiler to track the training process and resources, and visualize the corresponding artifacts using the command: `%tensorboard --logdir $PROFILER_DIR`.

### Vertex Experiments

In [29]:
EXPERIMENT_NAME   = f'scale-my-mf-env-{PREFIX}'

invoke_time       = time.strftime("%Y%m%d-%H%M%S")
RUN_NAME          = f'run-{invoke_time}'

BASE_OUTPUT_DIR   = f'gs://{BUCKET_NAME}/{EXPERIMENT_NAME}/{RUN_NAME}'
LOG_DIR           = f"{BASE_OUTPUT_DIR}/logs"
ROOT_DIR          = f"{BASE_OUTPUT_DIR}/root"                               # Root directory for writing logs/summaries/checkpoints.
ARTIFACTS_DIR     = f"{BASE_OUTPUT_DIR}/artifacts"                          # Where the trained model will be saved and restored.

print(f"EXPERIMENT_NAME   : {EXPERIMENT_NAME}")
print(f"RUN_NAME          : {RUN_NAME}")
print(f"BASE_OUTPUT_DIR   : {BASE_OUTPUT_DIR}")
print(f"LOG_DIR           : {LOG_DIR}")
print(f"ROOT_DIR          : {ROOT_DIR}")
print(f"ARTIFACTS_DIR     : {ARTIFACTS_DIR}")

EXPERIMENT_NAME   : scale-my-mf-env-rec-bandits-v2
RUN_NAME          : run-20231114-150903
BASE_OUTPUT_DIR   : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903
LOG_DIR           : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/logs
ROOT_DIR          : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/root
ARTIFACTS_DIR     : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/artifacts


In [30]:
# aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_NAME)

In [31]:
RUN_HYPERPARAMETER_TUNING       = False 
TRAIN_WITH_BEST_HYPERPARAMETERS = False   

### set Tensorboard

In [32]:
# # create new TB instance
TENSORBOARD_DISPLAY_NAME=f"{EXPERIMENT_NAME}-{RUN_NAME}"

tensorboard = aiplatform.Tensorboard.create(
    display_name=TENSORBOARD_DISPLAY_NAME
    , project=PROJECT_ID
    , location=REGION
)

TB_RESOURCE_NAME = tensorboard.resource_name

TB_ID = TB_RESOURCE_NAME.split('/')[-1]

print(f"TB_RESOURCE_NAME : {TB_RESOURCE_NAME}")
print(f"TB display name  : {tensorboard.display_name}")
print(f"TB_ID            : {TB_ID}")

TB_RESOURCE_NAME : projects/934903580331/locations/us-central1/tensorboards/2516109214863065088
TB display name  : scale-my-mf-env-rec-bandits-v2-run-20231114-150903
TB_ID            : 2516109214863065088


### set training args

In [33]:
# Set hyperparameters.
BATCH_SIZE       = BATCH_SIZE_best
TRAINING_LOOPS   = 100
STEPS_PER_LOOP   = 1
NUM_ACTIONS      = NUM_ACTIONS_best

CHKPT_INTERVAL       = TRAINING_LOOPS // 5

print(f"BATCH_SIZE     : {BATCH_SIZE}")
print(f"TRAINING_LOOPS : {TRAINING_LOOPS}")
print(f"STEPS_PER_LOOP : {STEPS_PER_LOOP}")
print(f"NUM_ACTIONS    : {NUM_ACTIONS}")
print(f"CHKPT_INTERVAL : {CHKPT_INTERVAL}")

BATCH_SIZE     : 128
TRAINING_LOOPS : 100
STEPS_PER_LOOP : 1
NUM_ACTIONS    : 24
CHKPT_INTERVAL : 20


In [34]:
WORKER_ARGS = [
    f"--data-path={DATA_PATH}"               # TODO - remove duplicate arg
    , f"--bucket_name={BUCKET_NAME}"
    , f"--data_gcs_prefix={DATA_GCS_PREFIX}"
    , f"--data_path={DATA_PATH}"
    , f"--project_number={PROJECT_NUM}"
    , f"--batch-size={BATCH_SIZE}"
    , f"--rank-k={RANK_K}"
    , f"--num-actions={NUM_ACTIONS_best}"
    , f"--tikhonov-weight={TIKHONOV_WEIGHT}"
    , f"--agent-alpha={AGENT_ALPHA}"
    , f"--training_loops={TRAINING_LOOPS}"
    , f"--steps-per-loop={STEPS_PER_LOOP}"
    , f"--distribute={DISTRIBUTE_STRATEGY}"
    , f"--artifacts_dir={ARTIFACTS_DIR}"
    , f"--root_dir={ROOT_DIR}"
    , f"--log_dir={LOG_DIR}"
    , f"--experiment_name={EXPERIMENT_NAME}"
    , f"--experiment_run={RUN_NAME}"
    , f"--tf_gpu_thread_count={TF_GPU_THREAD_COUNT}"
    , f"--chkpt_interval={CHKPT_INTERVAL}"
    # , f"--profiler"
    , f"--sum_grads_vars"
    , f"--debug_summaries"
    , f"--use_gpu"
    # , f"--use_tpu"
]

if RUN_HYPERPARAMETER_TUNING:
    WORKER_ARGS.append("--run-hyperparameter-tuning")
elif TRAIN_WITH_BEST_HYPERPARAMETERS:
    WORKER_ARGS.append("--train-with-best-hyperparameters")
    WORKER_ARGS.append(f"--best-hyperparameters-bucket={BUCKET_NAME}")
    WORKER_ARGS.append(f"--best-hyperparameters-path={HPTUNING_RESULT_PATH}")
    
from src.per_arm_rl import train_utils

WORKER_POOL_SPECS = train_utils.prepare_worker_pool_specs(
    image_uri=f"{IMAGE_URI_01}:latest",
    args=WORKER_ARGS,
    replica_count=REPLICA_COUNT,
    machine_type=WORKER_MACHINE_TYPE,
    accelerator_count=PER_MACHINE_ACCELERATOR_COUNT,
    accelerator_type=ACCELERATOR_TYPE,
    reduction_server_count=REDUCTION_SERVER_COUNT,
    reduction_server_machine_type=REDUCTION_SERVER_MACHINE_TYPE,
)

from pprint import pprint
pprint(WORKER_POOL_SPECS)

[{'container_spec': {'args': ['--data-path=gs://rec-bandits-v2-hybrid-vertex-bucket/data',
                              '--bucket_name=rec-bandits-v2-hybrid-vertex-bucket',
                              '--data_gcs_prefix=data',
                              '--data_path=gs://rec-bandits-v2-hybrid-vertex-bucket/data',
                              '--project_number=934903580331',
                              '--batch-size=128',
                              '--rank-k=20',
                              '--num-actions=24',
                              '--tikhonov-weight=0.001',
                              '--agent-alpha=10.0',
                              '--training_loops=100',
                              '--steps-per-loop=1',
                              '--distribute=single',
                              '--artifacts_dir=gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/artifacts',
                              '--root_dir=gs://rec-b

In [35]:
aiplatform.init(
    project=PROJECT_ID
    , location=REGION
    , experiment=EXPERIMENT_NAME
    # , staging_bucket=ROOT_DIR
)

JOB_NAME = f"mvl-best-train-{RUN_NAME}"
print(f"JOB_NAME: {JOB_NAME}")

JOB_NAME: mvl-best-train-run-20231114-150903


In [36]:
# Create a CustomJob
job = aiplatform.CustomJob(
    display_name=JOB_NAME
    , project=PROJECT_ID
    , worker_pool_specs=WORKER_POOL_SPECS
    , base_output_dir=BASE_OUTPUT_DIR
    , staging_bucket=ROOT_DIR
    # , location="asia-southeast1" # TODO
)

In [37]:
job.run(
    tensorboard=TB_RESOURCE_NAME,
    service_account=VERTEX_SA,
    restart_job_on_worker_restart=False,
    enable_web_access=True,
    sync=False,
)

In [38]:
print(f"Job Name: {job.display_name}")
print(f"Job Resource Name: {job.resource_name}\n")

Job Name: mvl-best-train-run-20231114-150903
Job Resource Name: projects/934903580331/locations/us-central1/customJobs/5365823245574471680



In [44]:
# job.error

### TensorBoard Profiler

In [39]:
!gsutil ls $LOG_DIR

gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/logs/
gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/logs/events.out.tfevents.1699974892.6a7c11bc622b.1.0.v2


In [40]:
# %load_ext tensorboard
%reload_ext tensorboard

In [41]:
%tensorboard --logdir=$LOG_DIR

## Making predictions

* When a policy is trained, given a new observation request (i.e. a user vector), 
* the policy will inference (produce) actions, which are the recommended movies. 
* In TF-Agents, observations are abstracted in a named tuple,

```
TimeStep(‘step_type’, ‘discount’, ‘reward’, ‘observation’)
```

* the policy map time steps to actions

In [45]:
CHKPOINT_DIR = f"{BASE_OUTPUT_DIR}/checkpoints"

print(f"BASE_OUTPUT_DIR: {BASE_OUTPUT_DIR}")
print(f"CHKPOINT_DIR: {CHKPOINT_DIR}")

BASE_OUTPUT_DIR: gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903
CHKPOINT_DIR: gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints


In [46]:
# ARTIFACTS_DIR = "gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230717-211248/model"

!gsutil ls $CHKPOINT_DIR

gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints/
gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints/checkpoint
gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints/ckpt-1.data-00000-of-00001
gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints/ckpt-1.index
gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints/ckpt-2.data-00000-of-00001
gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints/ckpt-2.index
gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints/ckpt-3.data-00000-of-00001
gs://rec-bandits-v2-hybrid-vertex-bucket/scale-my-mf-env-rec-bandits-v2/run-20231114-150903/checkpoints/ckpt-3.index
gs://rec-bandits-v2-hybrid-vertex-bu

In [48]:
# trained_policy = tf.saved_model.load(CHKPOINT_DIR)
# trained_policy

In [100]:
# train_step = trained_policy.get_train_step()
# print('Loaded policy at step: %d', train_step.numpy())

Loaded policy at step: %d -1


In [170]:
from src.per_arm_rl import my_per_arm_py_env as my_per_arm_py_env

env = my_per_arm_py_env.MyMovieLensPerArmPyEnvironment(
    project_number = PROJECT_NUM
    , data_path = DATA_PATH
    , bucket_name = BUCKET_NAME
    , data_gcs_prefix = DATA_GCS_PREFIX
    , user_age_lookup_dict = data_config.USER_AGE_LOOKUP
    , user_occ_lookup_dict = data_config.USER_OCC_LOOKUP
    , movie_gen_lookup_dict = data_config.MOVIE_GEN_LOOKUP
    , num_users = data_config.MOVIELENS_NUM_USERS
    , num_movies = data_config.MOVIELENS_NUM_MOVIES
    , rank_k = RANK_K
    , batch_size = BATCH_SIZE
    , num_actions = NUM_ACTIONS
)

environment = tf_py_environment.TFPyEnvironment(env)

In [177]:
observation_array = environment._observe()

time_step = tf_agents.trajectories.restart(
    observation=observation_array,
    batch_size=tf.convert_to_tensor([BATCH_SIZE]),
)

action_step = trained_policy.action(time_step)

action_step.action.numpy().tolist()

[2, 9, 11, 0, 12, 5, 7, 13]

### debugging

In [56]:
# observation_array

In [57]:
# time_step

In [181]:
# action_step

In [58]:
# observation_array = environment._observe()
# observation_array

In [59]:
# time_step = tf_agents.trajectories.restart(
#     observation=observation_array,
#     batch_size=tf.convert_to_tensor([BATCH_SIZE]),
# )
# time_step

In [60]:
# action_step = trained_policy.action(time_step)
# action_step

In [61]:
# action_step.action.numpy().tolist()

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
# # Delete endpoint resource
# ! gcloud ai endpoints delete $endpoint.name --quiet --region $REGION

# # Delete model resource
# ! gcloud ai models delete $model.name --quiet

# # Delete Cloud Storage objects that were created
# ! gsutil -m rm -r $ARTIFACTS_DIR