# Scaling bandit training with Vertex AI 

**prerequisites:**
* build training image in `04b-build-training-image` noteook

**Recommendation**

When profiling a train job, we don't need to do a full train. 

> We just need to get multiple iterations of going through the entire Agent graph (i.e., from data iterator --> agent.train a few times)

In [1]:
# protobuf==3.20.3

In [2]:
! python3 -c "import google.cloud.aiplatform; print('aiplatform SDK version: {}'.format(google.cloud.aiplatform.__version__))"

aiplatform SDK version: 1.33.1


## setup notebook environment

In [3]:
!pwd

/home/jupyter/tf_vertex_agents/02-supervised-to-bandit-training


### Load env config
* use the prefix from `00-env-setup` notebook

In [4]:
VERSION        = "v2"                       # TODO
PREFIX         = f'rec-bandits-{VERSION}'   # TODO

print(f"PREFIX: {PREFIX}")

PREFIX: rec-bandits-v2


In [5]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-bucket'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"
VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "rec-bandits-v2"
VERSION                  = "v2"

BUCKET_NAME              = "rec-bandits-v2-hybrid-vertex-bucket"
BUCKET_URI               = "gs://rec-bandits-v2-hybrid-vertex-bucket"
DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://rec-bandits-v2-hybrid-vertex-bucket/data"
VOCAB_SUBDIR             = "vocabs"
VOCAB_FILENAME           = "vocab_dict.pkl"

VPC_NETWORK_FULL         = "projects/934903580331/global/networks/ucaip-haystack-vpc-network"

BIGQUERY_DATASET_NAME    = "mvlens_rec_bandits_v2"
BIGQUERY_TABLE_NAME      = "training_dataset"

REPOSITORY               = "rl-movielens-rec-bandits-v2"

DOCKERNAM

### imports

In [6]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

In [7]:
import json
from datetime import datetime
from time import time
import pandas as pd
import numpy as np

# disable INFO and DEBUG logging everywhere
import logging
import time
from pprint import pprint
import pickle as pkl

logging.disable(logging.WARNING)

from google.cloud import aiplatform as vertex_ai
from google.cloud import storage

In [8]:
import sys
sys.path.append("..")

from src import train_utils
from src.data import data_utils as data_utils
from src.data import data_config as data_config

In [9]:
storage_client = storage.Client(project=PROJECT_ID)

vertex_ai.init(project=PROJECT_ID,location=REGION)

In [10]:
EXAMPLE_GEN_GCS_PATH = data_config.EXAMPLE_GEN_GCS_PATH
GCS_DATA_PATH = f"{BUCKET_URI}/{EXAMPLE_GEN_GCS_PATH}"

print(f"GCS_DATA_PATH: {GCS_DATA_PATH}")

!gsutil ls $GCS_DATA_PATH

GCS_DATA_PATH: gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/mv_b128_g12_a16/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/mv_b128_g12_a16_v4/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/mv_b128_g12_a16_v5/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/mv_b128_g12_a16_v6/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/train/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/val/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens/m1m/vocabs/


# Vertex Training Job

## job compute

Set the variable `TRAIN_COMPUTE` to configure the compute resources for the VMs you will use for for training.

**Machine Type:**
* `n1-standard`: 3.75GB of memory per vCPU.
* `n1-highmem`: 6.5GB of memory per vCPU
* `n1-highcpu`: 0.9 GB of memory per vCPU
* `vCPUs`: number of `[2, 4, 8, 16, 32, 64, 96 ]`

**Note:** The following is not supported for training:

* `standard`: 2 vCPUs
* `highcpu`: 2, 4 and 8 vCPUs

> Note: You may also use n2 and e2 machine types for training and deployment, but they do not support GPUs.

relevant docs: 
* [Configure compute resources for training](https://cloud.google.com/vertex-ai/docs/training/configure-compute#machine-types) for more details
* [Machine series comparison](https://cloud.google.com/compute/docs/machine-resource#machine_type_comparison)

In [11]:
ACCELERATOR = "t4" # str: "a100" | "t4" | None | l4
ACCELERATOR = str(ACCELERATOR)
print(f"ACCELERATOR: {ACCELERATOR}")

ACCELERATOR: t4


In [12]:
if ACCELERATOR == "a100":
    WORKER_MACHINE_TYPE = 'a2-highgpu-1g'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_TESLA_A100'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
    DISTRIBUTE_STRATEGY = 'single'
elif ACCELERATOR == 't4':
    # WORKER_MACHINE_TYPE = 'n1-highcpu-16'
    WORKER_MACHINE_TYPE = 'n1-highmem-16'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_TESLA_T4'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
elif ACCELERATOR == 'l4':
    WORKER_MACHINE_TYPE = "g2-standard-16"
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_L4'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
elif ACCELERATOR == 'tpu':
    WORKER_MACHINE_TYPE = "cloud-tpu"
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'TPU_v3'
    PER_MACHINE_ACCELERATOR_COUNT = 8 # 8 | +32+ for TPU Pods
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = None
elif ACCELERATOR == "False":
    WORKER_MACHINE_TYPE = 'n2-highmem-32' # 'n1-highmem-96'n | 'n2-highmem-92'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = None
    PER_MACHINE_ACCELERATOR_COUNT = 0
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
    
TF_GPU_THREAD_COUNT   = '4'      # '1' | '4' | '8'

print(f"WORKER_MACHINE_TYPE            : {WORKER_MACHINE_TYPE}")
print(f"REPLICA_COUNT                  : {REPLICA_COUNT}")
print(f"ACCELERATOR_TYPE               : {ACCELERATOR_TYPE}")
print(f"PER_MACHINE_ACCELERATOR_COUNT  : {PER_MACHINE_ACCELERATOR_COUNT}")
print(f"DISTRIBUTE_STRATEGY            : {DISTRIBUTE_STRATEGY}")
print(f"REDUCTION_SERVER_COUNT         : {REDUCTION_SERVER_COUNT}")
print(f"REDUCTION_SERVER_MACHINE_TYPE  : {REDUCTION_SERVER_MACHINE_TYPE}")
print(f"TF_GPU_THREAD_COUNT            : {TF_GPU_THREAD_COUNT}")

WORKER_MACHINE_TYPE            : n1-highmem-16
REPLICA_COUNT                  : 1
ACCELERATOR_TYPE               : NVIDIA_TESLA_T4
PER_MACHINE_ACCELERATOR_COUNT  : 1
DISTRIBUTE_STRATEGY            : single
REDUCTION_SERVER_COUNT         : 0
REDUCTION_SERVER_MACHINE_TYPE  : n1-highcpu-16
TF_GPU_THREAD_COUNT            : 4


## set Vertex AI Experiment

In [13]:
EXPERIMENT_NAME   = f'02-deep-bandits-v1'

# new experiment
invoke_time       = time.strftime("%Y%m%d-%H%M%S")
RUN_NAME          = f'run-{invoke_time}'

CHECKPT_DIR       = f"{BUCKET_URI}/{EXPERIMENT_NAME}/chkpoint"
BASE_OUTPUT_DIR   = f"{BUCKET_URI}/{EXPERIMENT_NAME}/{RUN_NAME}"
LOG_DIR           = f"{BASE_OUTPUT_DIR}/logs"
ROOT_DIR          = f"{BASE_OUTPUT_DIR}/root"
ARTIFACTS_DIR     = f"{BASE_OUTPUT_DIR}/artifacts"  # Where the trained model will be saved and restored.

vertex_ai.init(
    project=PROJECT_ID,
    location=REGION,
    experiment=EXPERIMENT_NAME
)

print(f"EXPERIMENT_NAME   : {EXPERIMENT_NAME}")
print(f"RUN_NAME          : {RUN_NAME}\n")
print(f"CHECKPT_DIR       : {CHECKPT_DIR}")
print(f"BASE_OUTPUT_DIR   : {BASE_OUTPUT_DIR}")
print(f"LOG_DIR           : {LOG_DIR}")
print(f"ROOT_DIR          : {ROOT_DIR}")
print(f"ARTIFACTS_DIR     : {ARTIFACTS_DIR}")

EXPERIMENT_NAME   : 02-deep-bandits-v1
RUN_NAME          : run-20240313-213741

CHECKPT_DIR       : gs://rec-bandits-v2-hybrid-vertex-bucket/02-deep-bandits-v1/chkpoint
BASE_OUTPUT_DIR   : gs://rec-bandits-v2-hybrid-vertex-bucket/02-deep-bandits-v1/run-20240313-213741
LOG_DIR           : gs://rec-bandits-v2-hybrid-vertex-bucket/02-deep-bandits-v1/run-20240313-213741/logs
ROOT_DIR          : gs://rec-bandits-v2-hybrid-vertex-bucket/02-deep-bandits-v1/run-20240313-213741/root
ARTIFACTS_DIR     : gs://rec-bandits-v2-hybrid-vertex-bucket/02-deep-bandits-v1/run-20240313-213741/artifacts


## Create Tensorboard

In [14]:
NEW_TENSORBOARD = True

In [15]:
if NEW_TENSORBOARD:
    # # create new TB instance
    TENSORBOARD_DISPLAY_NAME=f"{EXPERIMENT_NAME}"
    tensorboard = vertex_ai.Tensorboard.create(
        display_name=TENSORBOARD_DISPLAY_NAME
        , project=PROJECT_ID
        , location=REGION
    )
    TB_RESOURCE_NAME = tensorboard.resource_name
else:
    # use existing TB instance
    # TB_RESOURCE_NAME = 'projects/934903580331/locations/us-central1/tensorboards/XXXXXXX' # TODO
    tensorboard = vertex_ai.Tensorboard(
        tensorboard_name=TB_RESOURCE_NAME
    )
print(f"TB_RESOURCE_NAME: {TB_RESOURCE_NAME}")
print(f"TB display name: {tensorboard.display_name}")

TB_RESOURCE_NAME: projects/934903580331/locations/us-central1/tensorboards/670583345687560192
TB display name: 02-deep-bandits-v1


## Set training args

In [16]:
print(f"IMAGE_URI_02 : {IMAGE_URI_02}")

IMAGE_URI_02 : gcr.io/hybrid-vertex/train-perarm-feats-v2


In [24]:
# ================================
# data config
# ================================
GLOBAL_DIM             = 64       # 16
PER_ARM_DIM            = 72       # 16
NUM_OOV_BUCKETS        = 1
GLOBAL_EMBEDDING_SIZE  = 12
MV_EMBEDDING_SIZE      = 16       # 32
SPLIT                  = "train"  # TODO - remove
RESUME_TRAINING        = None

# Set hyperparameters.
NUM_EPOCHS           = 5
BATCH_SIZE           = 128          # Training and prediction batch size.
TRAINING_LOOPS       = 500          # Number of training iterations.
STEPS_PER_LOOP       = 1            # Number of driver steps per training iteration.
ASYNC_STEPS_PER_LOOP = 1
LOG_INTERVAL         = 10
LR                   = 0.05

CHKPT_INTERVAL       = 1000
EVAL_BATCH_SIZE      = 1  
NUM_EVAL_STEPS       = 2000 #10000

# Set MovieLens simulation environment parameters.
RANK_K               = 10      # Rank for matrix factorization in the MovieLens environment; also the observation dimension.
NUM_ACTIONS          = 2       # Number of actions (movie items) to choose from.
PER_ARM              = True    # Use the non-per-arm version of the MovieLens environment.

# ================================
# Agent
# ================================
AGENT_TYPE          = 'epsGreedy' # 'LinUCB' | 'LinTS |, 'epsGreedy' | 'NeuralLinUCB'
NETWORK_TYPE        = "commontower" # 'commontower' | 'dotproduct'

TIKHONOV_WEIGHT     = 0.001   # LinUCB Tikhonov regularization weight.
AGENT_ALPHA         = 0.1     # LinUCB exploration parameter that multiplies the confidence intervals.
EPSILON             = 0.01
ENCODING_DIM        = 1
EPS_PHASE_STEPS     = 1000

# ================================
# network params
# ================================
# GLOBAL_LAYERS       = [128, 64, 32]
# ARM_LAYERS          = [128, 64, 32]
# COMMON_LAYERS       = [32, 16, 8]

GLOBAL_LAYERS   = [GLOBAL_DIM, int(GLOBAL_DIM/2), int(GLOBAL_DIM/4)]
ARM_LAYERS      = [PER_ARM_DIM, int(PER_ARM_DIM/2), int(PER_ARM_DIM/4)]

FIRST_COMMON_LAYER = GLOBAL_LAYERS[-1] + ARM_LAYERS[-1]
COMMON_LAYERS = [
    int(FIRST_COMMON_LAYER),
    # int(FIRST_COMMON_LAYER/2),
    int(FIRST_COMMON_LAYER/4)
]

if AGENT_TYPE == 'NeuralLinUCB':
    NETWORK_TYPE = 'commontower'
    ENCODING_DIM = COMMON_LAYERS[-1]

print(f"VOCAB_SUBDIR           : {VOCAB_SUBDIR}")
print(f"VOCAB_FILENAME         : {VOCAB_FILENAME}")
print(f"BATCH_SIZE             : {BATCH_SIZE}")
print(f"TRAINING_LOOPS         : {TRAINING_LOOPS}")
print(f"STEPS_PER_LOOP         : {STEPS_PER_LOOP}")
print(f"ASYNC_STEPS_PER_LOOP   : {ASYNC_STEPS_PER_LOOP}")
print(f"LOG_INTERVAL           : {LOG_INTERVAL}")
print(f"RANK_K                 : {RANK_K}")
print(f"NUM_ACTIONS            : {NUM_ACTIONS}")
print(f"PER_ARM                : {PER_ARM}")
print(f"AGENT_TYPE             : {AGENT_TYPE}")
print(f"NETWORK_TYPE           : {NETWORK_TYPE}")
print(f"TIKHONOV_WEIGHT        : {TIKHONOV_WEIGHT}")
print(f"AGENT_ALPHA            : {AGENT_ALPHA}")
print(f"GLOBAL_DIM             : {GLOBAL_DIM}")
print(f"PER_ARM_DIM            : {PER_ARM_DIM}")
print(f"SPLIT                  : {SPLIT}")
print(f"RESUME_TRAINING        : {RESUME_TRAINING}")
print(f"NUM_OOV_BUCKETS        : {NUM_OOV_BUCKETS}")
print(f"GLOBAL_EMBEDDING_SIZE  : {GLOBAL_EMBEDDING_SIZE}")
print(f"MV_EMBEDDING_SIZE      : {MV_EMBEDDING_SIZE}")
print(f"AGENT_ALPHA            : {AGENT_ALPHA}")
print(f"GLOBAL_LAYERS          : {GLOBAL_LAYERS}")
print(f"ARM_LAYERS             : {ARM_LAYERS}")
print(f"COMMON_LAYERS          : {COMMON_LAYERS}")
print(f"LR                     : {LR}")
print(f"CHKPT_INTERVAL         : {CHKPT_INTERVAL}")
print(f"EVAL_BATCH_SIZE        : {EVAL_BATCH_SIZE}")
print(f"NUM_EVAL_STEPS         : {NUM_EVAL_STEPS}")
print(f"EPSILON                : {EPSILON}")
print(f"ENCODING_DIM           : {ENCODING_DIM}")
print(f"EPS_PHASE_STEPS        : {EPS_PHASE_STEPS}")

VOCAB_SUBDIR           : vocabs
VOCAB_FILENAME         : vocab_dict.pkl
BATCH_SIZE             : 128
TRAINING_LOOPS         : 500
STEPS_PER_LOOP         : 1
ASYNC_STEPS_PER_LOOP   : 1
LOG_INTERVAL           : 10
RANK_K                 : 10
NUM_ACTIONS            : 2
PER_ARM                : True
AGENT_TYPE             : epsGreedy
NETWORK_TYPE           : commontower
TIKHONOV_WEIGHT        : 0.001
AGENT_ALPHA            : 0.1
GLOBAL_DIM             : 64
PER_ARM_DIM            : 72
SPLIT                  : train
RESUME_TRAINING        : None
NUM_OOV_BUCKETS        : 1
GLOBAL_EMBEDDING_SIZE  : 12
MV_EMBEDDING_SIZE      : 16
AGENT_ALPHA            : 0.1
GLOBAL_LAYERS          : [64, 32, 16]
ARM_LAYERS             : [72, 36, 18]
COMMON_LAYERS          : [34, 8]
LR                     : 0.05
CHKPT_INTERVAL         : 1000
EVAL_BATCH_SIZE        : 1
NUM_EVAL_STEPS         : 2000
EPSILON                : 0.01
ENCODING_DIM           : 1
EPS_PHASE_STEPS        : 1000


In [25]:
WORKER_ARGS = [
    f"--project={PROJECT_ID}"
    , f"--project_number={PROJECT_NUM}"
    , f"--bucket_name={BUCKET_NAME}"
    , f"--artifacts_dir={ARTIFACTS_DIR}"
    # , f"--root_dir={ROOT_DIR}"
    , f"--chkpoint_dir={CHECKPT_DIR}"
    , f"--log_dir={LOG_DIR}"
    , f"--data_dir_prefix_path={EXAMPLE_GEN_GCS_PATH}"
    , f"--vocab_prefix_path={EXAMPLE_GEN_GCS_PATH}/{VOCAB_SUBDIR}"
    , f"--vocab_filename={VOCAB_FILENAME}"
    ### job config
    , f"--distribute={DISTRIBUTE_STRATEGY}"
    , f"--experiment_name={EXPERIMENT_NAME}"
    , f"--experiment_run={RUN_NAME}"
    , f"--agent_type={AGENT_TYPE}"
    , f"--network_type={NETWORK_TYPE}"
    ### hparams
    , f"--batch_size={BATCH_SIZE}"
    , f"--eval_batch_size={EVAL_BATCH_SIZE}"
    , f"--training_loops={TRAINING_LOOPS}"
    , f"--steps_per_loop={STEPS_PER_LOOP}"
    , f"--num_eval_steps={NUM_EVAL_STEPS}"
    , f"--rank_k={RANK_K}"
    , f"--num_actions={NUM_ACTIONS}"
    , f"--async_steps_per_loop={ASYNC_STEPS_PER_LOOP}"
    # , f"--resume_training_loops"
    , f"--global_dim={GLOBAL_DIM}"
    , f"--per_arm_dim={PER_ARM_DIM}"
    , f"--split={SPLIT}"
    , f"--log_interval={LOG_INTERVAL}"
    , f"--chkpt_interval={CHKPT_INTERVAL}"
    , f"--num_oov_buckets={NUM_OOV_BUCKETS}"
    , f"--global_emb_size={GLOBAL_EMBEDDING_SIZE}"
    , f"--mv_emb_size={MV_EMBEDDING_SIZE}"
    , f"--agent_alpha={AGENT_ALPHA}"
    , f"--global_layers={GLOBAL_LAYERS}"
    , f"--arm_layers={ARM_LAYERS}"
    , f"--common_layers={COMMON_LAYERS}"
    , f"--learning_rate={LR}"
    , f"--epsilon={EPSILON}"
    , f"--encoding_dim={ENCODING_DIM}"
    , f"--eps_phase_steps={EPS_PHASE_STEPS}"
    , f"--tf_gpu_thread_count={TF_GPU_THREAD_COUNT}"
    , f"--num_epochs={NUM_EPOCHS}"
    ### accelerators & profiling
    , f"--use_gpu"
    # , f"--use_tpu"
    # , f"--profiler"
    , f"--sum_grads_vars"
    , f"--debug_summaries"
    # , f"--cache_train"
    # , f"--is_testing"
]

WORKER_POOL_SPECS = train_utils.prepare_worker_pool_specs(
    # image_uri=f"{REMOTE_IMAGE_NAME}:latest",
    image_uri=f"{IMAGE_URI_02}:latest",
    args=WORKER_ARGS,
    replica_count=REPLICA_COUNT,
    machine_type=WORKER_MACHINE_TYPE,
    accelerator_count=PER_MACHINE_ACCELERATOR_COUNT,
    accelerator_type=ACCELERATOR_TYPE,
    reduction_server_count=REDUCTION_SERVER_COUNT,
    reduction_server_machine_type=REDUCTION_SERVER_MACHINE_TYPE,
)

from pprint import pprint
pprint(WORKER_POOL_SPECS)

[{'container_spec': {'args': ['--project=hybrid-vertex',
                              '--project_number=934903580331',
                              '--bucket_name=rec-bandits-v2-hybrid-vertex-bucket',
                              '--artifacts_dir=gs://rec-bandits-v2-hybrid-vertex-bucket/02-deep-bandits-v1/run-20240313-213741/artifacts',
                              '--chkpoint_dir=gs://rec-bandits-v2-hybrid-vertex-bucket/02-deep-bandits-v1/chkpoint',
                              '--log_dir=gs://rec-bandits-v2-hybrid-vertex-bucket/02-deep-bandits-v1/run-20240313-213741/logs',
                              '--data_dir_prefix_path=data/movielens/m1m',
                              '--vocab_prefix_path=data/movielens/m1m/vocabs',
                              '--vocab_filename=vocab_dict.pkl',
                              '--distribute=single',
                              '--experiment_name=02-deep-bandits-v1',
                              '--experiment_run=run-20240313-213741',
 

In [26]:
# !pwd

# Submit trainging job

In [27]:
vertex_ai.init(
    project=PROJECT_ID
    , location=REGION
    , experiment=EXPERIMENT_NAME
    # , staging_bucket=ROOT_DIR
)

JOB_NAME = f"{EXPERIMENT_NAME}-{RUN_NAME}"
print(f"JOB_NAME: {JOB_NAME}")

JOB_NAME: 02-deep-bandits-v1-run-20240313-213741


In [28]:
# Create a CustomJob
my_custom_job = vertex_ai.CustomJob(
    display_name=JOB_NAME
    , project=PROJECT_ID
    , worker_pool_specs=WORKER_POOL_SPECS
    , base_output_dir=BASE_OUTPUT_DIR
    , staging_bucket=ROOT_DIR
    # , location="asia-southeast1" 
)

In [29]:
my_custom_job.run(
    tensorboard=TB_RESOURCE_NAME,
    service_account=VERTEX_SA,
    restart_job_on_worker_restart=False,
    enable_web_access=True,
    sync=False,
)

In [27]:
print(f"Job Name: {my_custom_job.display_name}")
print(f"Job Resource Name: {my_custom_job.resource_name}\n")

Job Name: 02-deep-reward-bandits-run-20240222-215620
Job Resource Name: projects/934903580331/locations/us-central1/customJobs/5234690678482534400



### Get link to Vertex AI Experiment console

In [50]:
experiment_df = vertex_ai.get_experiment_df()
experiment_df = experiment_df[experiment_df.experiment_name == EXPERIMENT_NAME]
experiment_df.T

Unnamed: 0,0,1,2,3
experiment_name,02-scale-compare-v5,02-scale-compare-v5,02-scale-compare-v5,02-scale-compare-v5
run_name,run-20231214-174236,run-20231214-172310,run-20231214-171428,run-20231214-165818
run_type,system.ExperimentRun,system.ExperimentRun,system.ExperimentRun,system.ExperimentRun
state,COMPLETE,COMPLETE,COMPLETE,COMPLETE
param.batch_size,128.0,128.0,128.0,128.0
param.global_lyrs,"[64, 32, 16]","[64, 32, 16]","[64, 32, 16]","[64, 32, 16]"
param.arm_lyrs,"[64, 32, 16]","[64, 32, 16]","[64, 32, 16]","[64, 32, 16]"
param.runtime,3.0,3.0,0.0,0.0
param.encoding_dim,1.0,1.0,8.0,1.0
param.network,commontower,commontower,commontower,commontower


In [52]:
# print("Open the following link", experiment_df["metric.lineage"][0])

### GPU profiling

> once training job begins, enter these commands in the Vertex interactive terminal:

```bash
sudo apt update
sudo apt -y install nvtop
```

## TensorBoard

### in-notebook TensorBoard

> if `--profiler`, find `PROFILE` in the drop down:

<img src="imgs/getting_profiler.png" 
     align="center" 
     width="850"
     height="850"/>

In [27]:
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf
from tensorboard import notebook

print(f"LOG_DIR: {LOG_DIR}")

notebook.list()

LOG_DIR: gs://rec-bandits-v2-hybrid-vertex-bucket/02-online-dotp/run-20240221-023400/logs
Known TensorBoard instances:
  - port 6006: logdir gs://rec-bandits-v2-hybrid-vertex-bucket/02b-deep-bandits-rec-bandits-v2/run-20240214-180454/logs (started 6 days, 9:42:33 ago; pid 3029265)


In [28]:
# %load_ext tensorboard
%reload_ext tensorboard

In [29]:
%tensorboard --logdir=$LOG_DIR

# Making predictions

* When a policy is trained, given a new observation request (i.e. a user vector),
* the policy will inference (produce) actions, which are the recommended movies.
* In TF-Agents, observations are abstracted in a named tuple,

```
TimeStep(‘step_type’, ‘discount’, ‘reward’, ‘observation’)
```

> the policy maps time steps to actions

In [30]:
import tensorflow as tf
from src.perarm_features import emb_features as emb_features
from src.perarm_features import reward_factory as reward_factory

## Load eval dataset

In [31]:
DATA_GCS_PREFIX

'data/movielens-1m'

In [32]:
SPLIT = "val"

val_files = []
for blob in storage_client.list_blobs(f"{BUCKET_NAME}", prefix=f'{DATA_GCS_PREFIX}/{SPLIT}'):
    if '.tfrecord' in blob.name:
        val_files.append(blob.public_url.replace("https://storage.googleapis.com/", "gs://"))

        
val_dataset = tf.data.TFRecordDataset(val_files)
val_dataset = val_dataset.map(data_utils.parse_tfrecord, num_parallel_calls=tf.data.AUTOTUNE)

# eval dataset
eval_ds = val_dataset.batch(1)

if NUM_EVAL_STEPS > 0:
    eval_ds = eval_ds.take(NUM_EVAL_STEPS)

eval_ds

<_TakeDataset element_spec={'bucketized_user_age': TensorSpec(shape=(None,), dtype=tf.float32, name=None), 'movie_genres': TensorSpec(shape=(None, 1), dtype=tf.int64, name=None), 'movie_id': TensorSpec(shape=(None,), dtype=tf.string, name=None), 'timestamp': TensorSpec(shape=(None,), dtype=tf.int64, name=None), 'user_id': TensorSpec(shape=(None,), dtype=tf.string, name=None), 'user_occupation_text': TensorSpec(shape=(None,), dtype=tf.string, name=None), 'user_rating': TensorSpec(shape=(None,), dtype=tf.float32, name=None)}>

In [33]:
val_files

['gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens-1m/val/ml-1m-ratings-train-09-of-10.tfrecord',
 'gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens-1m/val/ml-1m-ratings-train-10-of-10.tfrecord',
 'gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens-1m/val_v1/ml-1m-ratings-train-7-of-10.tfrecord',
 'gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens-1m/val_v1/ml-1m-ratings-train-8-of-10.tfrecord']

### Load vocabulary

In [34]:
EXISTING_VOCAB_FILE = f'gs://{BUCKET_NAME}/{DATA_GCS_PREFIX}/{VOCAB_FILENAME}'
print(f"Downloading vocab...")

os.system(f'gsutil -q cp {EXISTING_VOCAB_FILE} .')
print(f"Downloaded vocab from: {EXISTING_VOCAB_FILE}\n")

filehandler = open(VOCAB_FILENAME, 'rb')
vocab_dict = pkl.load(filehandler)
filehandler.close()

for key in vocab_dict.keys():
    pprint(key)

Downloading vocab...
Downloaded vocab from: gs://rec-bandits-v2-hybrid-vertex-bucket/data/movielens-1m/vocab_dict.pkl

'movie_id'
'user_id'
'user_occupation_text'
'movie_genres'
'bucketized_user_age'
'max_timestamp'
'min_timestamp'
'timestamp_buckets'


## load trained policy

In [35]:
# MODEL_DIR = "gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230717-211248/model"

!gsutil ls $ARTIFACTS_DIR

gs://rec-bandits-v2-hybrid-vertex-bucket/02-online-dotp/run-20240221-023400/artifacts/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-online-dotp/run-20240221-023400/artifacts/fingerprint.pb
gs://rec-bandits-v2-hybrid-vertex-bucket/02-online-dotp/run-20240221-023400/artifacts/policy_specs.pbtxt
gs://rec-bandits-v2-hybrid-vertex-bucket/02-online-dotp/run-20240221-023400/artifacts/saved_model.pb
gs://rec-bandits-v2-hybrid-vertex-bucket/02-online-dotp/run-20240221-023400/artifacts/assets/
gs://rec-bandits-v2-hybrid-vertex-bucket/02-online-dotp/run-20240221-023400/artifacts/variables/


In [36]:
from tf_agents.policies import py_tf_eager_policy

trained_policy = py_tf_eager_policy.SavedModelPyTFEagerPolicy(
    ARTIFACTS_DIR, load_specs_from_pbtxt=True
)

trained_policy

<tf_agents.policies.py_tf_eager_policy.SavedModelPyTFEagerPolicy at 0x7f44d9655090>

## call embedding models

In [37]:
GLOBAL_EMBEDDING_SIZE

16

In [38]:
embs = emb_features.EmbeddingModel(
    vocab_dict = vocab_dict,
    num_oov_buckets = NUM_OOV_BUCKETS,
    global_emb_size = GLOBAL_EMBEDDING_SIZE,
    mv_emb_size = MV_EMBEDDING_SIZE,
)

embs

<src.perarm_features.emb_features.EmbeddingModel at 0x7f43cc42c1f0>

In [39]:
eval_ds

<_TakeDataset element_spec={'bucketized_user_age': TensorSpec(shape=(None,), dtype=tf.float32, name=None), 'movie_genres': TensorSpec(shape=(None, 1), dtype=tf.int64, name=None), 'movie_id': TensorSpec(shape=(None,), dtype=tf.string, name=None), 'timestamp': TensorSpec(shape=(None,), dtype=tf.int64, name=None), 'user_id': TensorSpec(shape=(None,), dtype=tf.string, name=None), 'user_occupation_text': TensorSpec(shape=(None,), dtype=tf.string, name=None), 'user_rating': TensorSpec(shape=(None,), dtype=tf.float32, name=None)}>

## Run inference with trained policy

In [40]:
INFER_SIZE = 1
dummy_arm = tf.zeros([INFER_SIZE, PER_ARM_DIM], dtype=tf.float32)

SKIP_NUM = 10

for x in eval_ds.skip(SKIP_NUM).take(INFER_SIZE):
    # get feature tensors    
    global_feat_infer = embs._get_global_context_features(x)
    arm_feat_infer = embs._get_per_arm_features(x)
    
    # rewards = _get_rewards(x)
    rewards = reward_factory._get_rewards(x)
    
    # reshape arm features
    arm_feat_infer = tf.reshape(arm_feat_infer, [EVAL_BATCH_SIZE, PER_ARM_DIM]) # perarm_dim
    concat_arm = tf.concat([arm_feat_infer, dummy_arm], axis=0)
    
    # flatten global
    flat_global_infer = tf.reshape(global_feat_infer, [GLOBAL_DIM])
    feature = {'global': flat_global_infer, 'per_arm': concat_arm}
    
    # get actual reward
    actual_reward = rewards.numpy()[0]
    
    # build trajectory step
    trajectory_step = train_utils._get_eval_step(feature, actual_reward)
    
    prediction = trained_policy.action(trajectory_step)

In [41]:
global_feat_infer

<tf.Tensor: shape=(1, 64), dtype=float32, numpy=
array([[-0.02423429,  0.03569544,  0.00941722,  0.01268059, -0.03076227,
         0.03151837, -0.0090163 , -0.02012342,  0.04700836,  0.04948396,
         0.0341096 , -0.04149907,  0.00039178,  0.01899574, -0.00683619,
        -0.04258578, -0.01099278,  0.02549082,  0.01653793,  0.03991942,
         0.04965748, -0.0127555 ,  0.02935164,  0.01087339, -0.01636513,
        -0.02368754, -0.03929192,  0.02375125,  0.02551729, -0.03736371,
        -0.02413548,  0.03881891, -0.01570874, -0.01364864,  0.03775977,
         0.00097629, -0.01052288, -0.01703221, -0.03854374, -0.00365437,
         0.04919578, -0.02562262,  0.02056113,  0.01011078, -0.03701179,
         0.03339164, -0.0496799 , -0.01876576,  0.04010205,  0.02692163,
         0.03405235,  0.02577719, -0.00747878, -0.0301556 ,  0.00328819,
        -0.00376345, -0.04580323,  0.00474114, -0.00995062, -0.04982231,
        -0.01080923, -0.01271247,  0.00133632,  0.03775134]],
      dtype=f

In [42]:
x

{'bucketized_user_age': <tf.Tensor: shape=(1,), dtype=float32, numpy=array([25.], dtype=float32)>,
 'movie_genres': <tf.Tensor: shape=(1, 1), dtype=int64, numpy=array([[0]])>,
 'movie_id': <tf.Tensor: shape=(1,), dtype=string, numpy=array([b'2641'], dtype=object)>,
 'timestamp': <tf.Tensor: shape=(1,), dtype=int64, numpy=array([968355413])>,
 'user_id': <tf.Tensor: shape=(1,), dtype=string, numpy=array([b'3519'], dtype=object)>,
 'user_occupation_text': <tf.Tensor: shape=(1,), dtype=string, numpy=array([b'sales/marketing'], dtype=object)>,
 'user_rating': <tf.Tensor: shape=(1,), dtype=float32, numpy=array([4.], dtype=float32)>}

In [43]:
prediction

PolicyStep(action=array(0, dtype=int32), state=(), info=PerArmPolicyInfo(log_probability=(), predicted_rewards_mean=array([3.7975988, 3.6516001], dtype=float32), multiobjective_scalarized_predicted_rewards_mean=(), predicted_rewards_optimistic=(), predicted_rewards_sampled=(), bandit_policy_type=array([1], dtype=int32), chosen_arm_features=array([ 0.03101024,  0.01031622,  0.04479486, -0.02150822, -0.04589235,
       -0.01095872,  0.01401026, -0.01187184, -0.0410215 , -0.01127779,
        0.04456648,  0.04855448, -0.01872145, -0.00123424, -0.02530294,
        0.00059376, -0.00525342, -0.04443268, -0.04115971,  0.00346   ,
        0.00196717,  0.03168103,  0.04312061,  0.04696795, -0.02688395,
       -0.00467808,  0.03762523,  0.0239989 , -0.02450874, -0.03839843,
        0.0156719 ,  0.00319712, -0.03129433,  0.00797918,  0.04204294,
       -0.01635186,  0.04087284, -0.04855653, -0.00927971, -0.00776714,
        0.03924856,  0.04905334,  0.00831705, -0.03254086,  0.01044741,
        0.

# Clean up

In [None]:
# TODO

**Finished**