# Scale per-arm Banidt training with Vertex AI

## Overview

### Notebook Objectives:
* Create hyperparameter tuning and training custom container
* Submit hyperparameter tuning job (optional)
* Create custom prediction container
* Submit custom container training job
* Deploy trained model to Endpoint
* Predict on the Endpoint

**TODO** - fix vars -Create hyperparameter tuning and training custom container

Create a custom container that can be used for both hyperparameter tuning and training. The associated source code is in `src/per_arm_rl/`. This serves as the inner script of the custom container.
As before, the training function is the same as [trainer.train](https://github.com/tensorflow/agents/blob/r0.8.0/tf_agents/bandits/agents/examples/v2/trainer.py#L104), but it keeps track of intermediate metric values, supports hyperparameter tuning, and (for training) saves artifacts to different locations. The training logic for hyperparameter tuning and training is the same.

**Execute hyperparameter tuning:**
* The code does not save model artifacts. It takes in command-line arguments as hyperparameter values from the Vertex AI Hyperparameter Tuning service, and reports training result metric to Vertex AI at each trial using cloudml-hypertune.
* Note that if you decide to save model artifacts, saving them to the same directory may cause overwriting errors if you use parallel trials in the hyperparameter tuning job. The recommended approach is to save each trial's artifacts to a different sub-directory. This would also allow you to recover all the artifacts from different trials and can potentially save you from re-training.
* Read more about hyperparameter tuning for custom containers [here](https://cloud.google.com/vertex-ai/docs/training/containers-overview#hyperparameter_tuning_with_custom_containers); read about hyperparameter tuning support [here](https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview).

**Execute training:**
* The code saves model artifacts to `os.environ["AIP_MODEL_DIR"]` in addition to `ARTIFACTS_DIR`, as required [here](https://github.com/googleapis/python-aiplatform/blob/v0.8.0/google/cloud/aiplatform/training_jobs.py#L2202).
* If you want to make changes to the function, make sure to still save the trained policy as a SavedModel to clean directories
* avoid saving checkpoints and other artifacts, so that deploying the model to endpoints works.

## Notebook setup

### set vars

In [155]:
# PREFIX = 'mabv1'
VERSION        = "v2"                       # TODO
PREFIX         = f'rec-bandits-{VERSION}'   # TODO

print(f"PREFIX: {PREFIX}")

PREFIX: rec-bandits-v2


**run the next cell to populate env vars**

In [156]:
# staging GCS
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-bucket'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

config = !gsutil cat {BUCKET_URI}/config/notebook_env.py
print(config.n)
exec(config.n)


PROJECT_ID               = "hybrid-vertex"
PROJECT_NUM              = "934903580331"
LOCATION                 = "us-central1"

REGION                   = "us-central1"
BQ_LOCATION              = "US"
VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"

VERTEX_SA                = "934903580331-compute@developer.gserviceaccount.com"

PREFIX                   = "rec-bandits-v2"
VERSION                  = "v2"

BUCKET_NAME              = "rec-bandits-v2-hybrid-vertex-bucket"
BUCKET_URI               = "gs://rec-bandits-v2-hybrid-vertex-bucket"
DATA_GCS_PREFIX          = "data"
DATA_PATH                = "gs://rec-bandits-v2-hybrid-vertex-bucket/data"
VOCAB_SUBDIR             = "vocabs"
VOCAB_FILENAME           = "vocab_dict.pkl"

VPC_NETWORK_FULL         = "projects/934903580331/global/networks/ucaip-haystack-vpc-network"

BIGQUERY_DATASET_ID      = "hybrid_vertex.movielens_ds_rec_bandits_v2"
BIGQUERY_TABLE_ID        = "hybrid_vertex.movielens_ds_rec_bandits_v2.training_dataset"

REPO

### imports

In [3]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

In [4]:
import functools
import json
from collections import defaultdict
from typing import Callable, Dict, List, Optional, TypeVar
from datetime import datetime
import time

import logging
logging.disable(logging.WARNING)

import matplotlib.pyplot as plt
import numpy as np

# google cloud
from google.cloud import aiplatform, storage
from google.cloud.aiplatform import hyperparameter_tuning as hpt

# tensorflow
import tensorflow as tf
from tf_agents.agents import TFAgent
from tf_agents.bandits.agents import lin_ucb_agent
from tf_agents.bandits.agents.examples.v2 import trainer
from tf_agents.bandits.environments import (environment_utilities,
                                            movielens_py_environment)
from tf_agents.bandits.metrics import tf_metrics as tf_bandit_metrics
from tf_agents.drivers import dynamic_step_driver
from tf_agents.environments import TFEnvironment, tf_py_environment
from tf_agents.eval import metric_utils
from tf_agents.metrics import tf_metrics
from tf_agents.metrics.tf_metric import TFStepMetric
from tf_agents.policies import policy_saver

# GPU
from numba import cuda 
import gc

if tf.__version__[0] != "2":
    raise Exception("The trainer only runs with TensorFlow version 2.")

T = TypeVar("T")

In [8]:
device = cuda.get_current_device()
device.reset()
gc.collect()

14

In [9]:
import sys
sys.path.append("..")

# my project
from src.per_arm_rl import data_utils
from src.per_arm_rl import data_config

In [10]:
# cloud storage client
storage_client = storage.Client(project=PROJECT_ID)

# Vertex client
aiplatform.init(project=PROJECT_ID, location=LOCATION)

In [11]:
! gsutil ls $DATA_PATH

gs://rec-bandits-v2-hybrid-vertex-bucket/data/ml-ratings-100k-full.tfrecord
gs://rec-bandits-v2-hybrid-vertex-bucket/data/listwise-3n-train/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/listwise-3n-val/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/listwise-5n-train/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/listwise-5n-val/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/train/
gs://rec-bandits-v2-hybrid-vertex-bucket/data/val/


## Build train application

### Vertex Experiments

In [12]:
EXPERIMENT_NAME   = f'scale-perarm-hpt-v3'

invoke_time       = time.strftime("%Y%m%d-%H%M%S")
RUN_NAME          = f'run-{invoke_time}'

BASE_OUTPUT_DIR   = f'gs://{BUCKET_NAME}/{EXPERIMENT_NAME}/{RUN_NAME}'
LOG_DIR           = f"{BASE_OUTPUT_DIR}/logs"
ROOT_DIR          = f"{BASE_OUTPUT_DIR}/root"                               # Root directory for writing logs/summaries/checkpoints.
ARTIFACTS_DIR     = f"{BASE_OUTPUT_DIR}/artifacts"                          # Where the trained model will be saved and restored.

print(f"EXPERIMENT_NAME   : {EXPERIMENT_NAME}")
print(f"RUN_NAME          : {RUN_NAME}")
print(f"BASE_OUTPUT_DIR   : {BASE_OUTPUT_DIR}")
print(f"LOG_DIR           : {LOG_DIR}")
print(f"ROOT_DIR          : {ROOT_DIR}")
print(f"ARTIFACTS_DIR     : {ARTIFACTS_DIR}")

EXPERIMENT_NAME   : scale-perarm-hpt-v3
RUN_NAME          : run-20231018-151159
BASE_OUTPUT_DIR   : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt-v3/run-20231018-151159
ROOT_DIR          : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt-v3/run-20231018-151159/root
ARTIFACTS_DIR     : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt-v3/run-20231018-151159/artifacts


### Create a Cloud Build YAML file

In [13]:
# %%writefile cloudbuild.yaml

# steps:
# - name: 'gcr.io/cloud-builders/docker'
#   args: ['build', '-t', '$_IMAGE_URI', '$_FILE_LOCATION', '-f', '$_FILE_LOCATION/Dockerfile_$_DOCKERNAME']
#   env: ['AIP_STORAGE_URI=$_ARTIFACTS_DIR']
# images:
# - '$_IMAGE_URI'

### Write a Dockerfile
* Use the [cloudml-hypertune](https://github.com/GoogleCloudPlatform/cloudml-hypertune) Python package to report training metrics to Vertex AI for hyperparameter tuning
* Use the Google [Cloud Storage client library](https://cloud.google.com/storage/docs/reference/libraries) to read the best hyperparameters learned from a previous hyperarameter tuning job during training

In [14]:
DOCKERNAME = 'Dockerfile_train_my_perarm_env'

In [15]:
# %%writefile Dockerfile_{DOCKERNAME}

# # Specifies base image and tag.
# # FROM gcr.io/google-appengine/python
# FROM python:3.10
# ENV PYTHONUNBUFFERED True

# WORKDIR /root

# # Installs additional packages.
# RUN pip3 install cloudml-hypertune
# RUN pip3 install google-cloud-storage
# RUN pip3 install google-cloud-aiplatform
# RUN pip3 install tensorflow==2.12.0
# RUN pip3 install tensorboard
# RUN pip3 install tensorboard-plugin-profile
# RUN pip3 install tensorboard-plugin-wit
# RUN pip3 install tensorboard-data-server
# RUN pip3 install tensorflow-io
# RUN pip3 install tf-agents==0.17.0
# RUN pip3 install matplotlib
# RUN pip3 install urllib3

# # Copies training code to the Docker image.
# COPY src/per_arm_rl /root/src/per_arm_rl

# # Sets up the entry point to invoke the task.
# ENTRYPOINT ["python3", "-m", "src.per_arm_rl.task"]

#### Build the custom container with Cloud Build

In [16]:
# export PROJECT_ID=hybrid-vertex
# export HPTUNING_TRAINING_CONTAINER=hptuning-training-custom-container
# export IMAGE_URI=gcr.io/hybrid-vertex/hptuning-training-custom-container

# ! docker build -t $IMAGE_URI Dockerfile_train_perarm

In [17]:
# HPTUNING_TRAINING_CONTAINER = "hptuning-training-custom-container"

# # Docker definitions for training
# IMAGE_URI = f'gcr.io/{PROJECT_ID}/{HPTUNING_TRAINING_CONTAINER}'
# MACHINE_TYPE ='e2-highcpu-32'
# FILE_LOCATION = './'

# print(f"export DOCKERNAME    = {DOCKERNAME}")
# print(f"export IMAGE_URI     = {IMAGE_URI}")
# print(f"export FILE_LOCATION = {FILE_LOCATION}")
# print(f"export MACHINE_TYPE  = {MACHINE_TYPE}")
# print(f"export ARTIFACTS_DIR = {ARTIFACTS_DIR}")

In [18]:
# ! gcloud builds submit --config cloudbuild.yaml \
#     --substitutions _DOCKERNAME=$DOCKERNAME,_IMAGE_URI=$IMAGE_URI,_FILE_LOCATION=$FILE_LOCATION,_ARTIFACTS_DIR=$ARTIFACTS_DIR \
#     --timeout=2h \
#     --machine-type=$MACHINE_TYPE

## Prepare (hpt) training job for Vertex AI
* Submit a hyperparameter training job with the custom container. Read more details for using Python packages as an alternative to using custom containers in the example shown [here](https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#create)
* Define the hyperparameter(s), max trial count, parallel trial count, parameter search algorithm, machine spec, accelerators, worker pool, etc.

In [19]:
# Execute hyperparameter tuning instead of regular training
RUN_HYPERPARAMETER_TUNING          = True
TRAIN_WITH_BEST_HYPERPARAMETERS    = False  # Do not train.

# Directory to store the best hyperparameter(s) in `BUCKET_NAME` and locally (temporarily)
HPTUNING_RESULT_DIR                = "hptuning"
HPTUNING_RESULT_FILE               = "result.json"
HPTUNING_RESULT_PATH               = f"{EXPERIMENT_NAME}/{RUN_NAME}/{HPTUNING_RESULT_DIR}/{HPTUNING_RESULT_FILE}"
HPTUNING_RESULT_URI                = f"{BUCKET_URI}/{HPTUNING_RESULT_PATH}"

print(f"HPTUNING_RESULT_DIR  : {HPTUNING_RESULT_DIR}")
print(f"HPTUNING_RESULT_FILE : {HPTUNING_RESULT_FILE}")
print(f"HPTUNING_RESULT_PATH : {HPTUNING_RESULT_PATH}")
print(f"HPTUNING_RESULT_URI  : {HPTUNING_RESULT_URI}")

HPTUNING_RESULT_DIR  : hptuning
HPTUNING_RESULT_FILE : result.json
HPTUNING_RESULT_PATH : scale-perarm-hpt-v3/run-20231018-151159/hptuning/result.json
HPTUNING_RESULT_URI  : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt-v3/run-20231018-151159/hptuning/result.json


### Accelerators

In [20]:
ACCELERATOR = "t4" # str: "a100" | "t4" | None | l4
ACCELERATOR = str(ACCELERATOR)
print(f"ACCELERATOR: {ACCELERATOR}")

ACCELERATOR: t4


In [21]:
if ACCELERATOR == "a100":
    WORKER_MACHINE_TYPE = 'a2-highgpu-1g'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_TESLA_A100'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
    DISTRIBUTE_STRATEGY = 'single'
elif ACCELERATOR == 't4':
    WORKER_MACHINE_TYPE = 'n1-highcpu-16'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_TESLA_T4'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
elif ACCELERATOR == 'l4':
    WORKER_MACHINE_TYPE = "g2-standard-16"
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'NVIDIA_L4'
    PER_MACHINE_ACCELERATOR_COUNT = 1
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
elif ACCELERATOR == 'tpu':
    WORKER_MACHINE_TYPE = "cloud-tpu"
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = 'TPU_v3'
    PER_MACHINE_ACCELERATOR_COUNT = 8 # 8 | +32+ for TPU Pods
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = None
elif ACCELERATOR == "False":
    WORKER_MACHINE_TYPE = 'n2-highmem-32' # 'n1-highmem-96'n | 'n2-highmem-92'
    REPLICA_COUNT = 1
    ACCELERATOR_TYPE = None
    PER_MACHINE_ACCELERATOR_COUNT = 0
    DISTRIBUTE_STRATEGY = 'single'
    REDUCTION_SERVER_COUNT = 0                                                      
    REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
    
TF_GPU_THREAD_COUNT   = '4'      # '1' | '4' | '8'

print(f"WORKER_MACHINE_TYPE            : {WORKER_MACHINE_TYPE}")
print(f"REPLICA_COUNT                  : {REPLICA_COUNT}")
print(f"ACCELERATOR_TYPE               : {ACCELERATOR_TYPE}")
print(f"PER_MACHINE_ACCELERATOR_COUNT  : {PER_MACHINE_ACCELERATOR_COUNT}")
print(f"DISTRIBUTE_STRATEGY            : {DISTRIBUTE_STRATEGY}")
print(f"REDUCTION_SERVER_COUNT         : {REDUCTION_SERVER_COUNT}")
print(f"REDUCTION_SERVER_MACHINE_TYPE  : {REDUCTION_SERVER_MACHINE_TYPE}")
print(f"TF_GPU_THREAD_COUNT            : {TF_GPU_THREAD_COUNT}")

WORKER_MACHINE_TYPE            : n1-highcpu-16
REPLICA_COUNT                  : 1
ACCELERATOR_TYPE               : NVIDIA_TESLA_T4
PER_MACHINE_ACCELERATOR_COUNT  : 1
DISTRIBUTE_STRATEGY            : single
REDUCTION_SERVER_COUNT         : 0
REDUCTION_SERVER_MACHINE_TYPE  : n1-highcpu-16
TF_GPU_THREAD_COUNT            : 4


### Create Tensorboard

In [23]:
# # create new TB instance
TENSORBOARD_DISPLAY_NAME=f"{EXPERIMENT_NAME}-{RUN_NAME}"

tensorboard = aiplatform.Tensorboard.create(
    display_name=TENSORBOARD_DISPLAY_NAME
    , project=PROJECT_ID
    , location=REGION
)

TB_RESOURCE_NAME = tensorboard.resource_name

print(f"TB_RESOURCE_NAME: {TB_RESOURCE_NAME}")
print(f"TB display name: {tensorboard.display_name}")

TB_RESOURCE_NAME: projects/934903580331/locations/us-central1/tensorboards/468378759293042688
TB display name: scale-perarm-hpt-v3-run-20231018-151159


### Set training args

In [24]:
# Set hyperparameters.
BATCH_SIZE       = 128       # Training and prediction batch size.
TRAINING_LOOPS   = 200      # Number of training iterations.
STEPS_PER_LOOP   = 6       # Number of driver steps per training iteration.

# Set MovieLens simulation environment parameters.
RANK_K           = 20      # Rank for matrix factorization in the MovieLens environment; also the observation dimension.
NUM_ACTIONS      = 20      # Number of actions (movie items) to choose from.
PER_ARM          = True    # Use the non-per-arm version of the MovieLens environment.

# Set agent parameters.
TIKHONOV_WEIGHT  = 0.001   # LinUCB Tikhonov regularization weight.
AGENT_ALPHA      = 10.0    # LinUCB exploration parameter that multiplies the confidence intervals.

CHKPT_INTERVAL       = TRAINING_LOOPS - 1

print(f"BATCH_SIZE       : {BATCH_SIZE}")
print(f"TRAINING_LOOPS   : {TRAINING_LOOPS}")
print(f"STEPS_PER_LOOP   : {STEPS_PER_LOOP}")
print(f"RANK_K           : {RANK_K}")
print(f"NUM_ACTIONS      : {NUM_ACTIONS}")
print(f"PER_ARM          : {PER_ARM}")
print(f"TIKHONOV_WEIGHT  : {TIKHONOV_WEIGHT}")
print(f"AGENT_ALPHA      : {AGENT_ALPHA}")
print(f"CHKPT_INTERVAL   : {CHKPT_INTERVAL}")

BATCH_SIZE       : 128
TRAINING_LOOPS   : 200
STEPS_PER_LOOP   : 6
RANK_K           : 20
NUM_ACTIONS      : 20
PER_ARM          : True
TIKHONOV_WEIGHT  : 0.001
AGENT_ALPHA      : 10.0


In [25]:
WORKER_ARGS = [
    f"--data-path={DATA_PATH}"               # TODO - remove duplicate arg
    , f"--bucket_name={BUCKET_NAME}"
    , f"--data_gcs_prefix={DATA_GCS_PREFIX}"
    , f"--data_path={DATA_PATH}"
    , f"--project_number={PROJECT_NUM}"
    , f"--batch-size={BATCH_SIZE}"
    , f"--rank-k={RANK_K}"
    , f"--num-actions={NUM_ACTIONS}"
    , f"--tikhonov-weight={TIKHONOV_WEIGHT}"
    , f"--agent-alpha={AGENT_ALPHA}"
    , f"--training_loops={TRAINING_LOOPS}"
    , f"--steps-per-loop={STEPS_PER_LOOP}"
    , f"--distribute={DISTRIBUTE_STRATEGY}"
    , f"--artifacts_dir={ARTIFACTS_DIR}"
    , f"--root_dir={ROOT_DIR}"
    , f"--log_dir={LOG_DIR}"
    , f"--experiment_name={EXPERIMENT_NAME}"
    , f"--experiment_run={RUN_NAME}"
    , f"--tf_gpu_thread_count={TF_GPU_THREAD_COUNT}"
    , f"--chkpt_interval={CHKPT_INTERVAL}"
    # , f"--profiler"
    # , f"--sum_grads_vars"
    # , f"--debug_summaries"
    , f"--use_gpu"
    # , f"--use_tpu"
]

if RUN_HYPERPARAMETER_TUNING:
    WORKER_ARGS.append("--run-hyperparameter-tuning")
    
elif TRAIN_WITH_BEST_HYPERPARAMETERS:
    WORKER_ARGS.append("--train-with-best-hyperparameters")
    
from src.per_arm_rl import train_utils

WORKER_POOL_SPECS = train_utils.prepare_worker_pool_specs(
    image_uri=f"{IMAGE_URI_01}:latest",
    args=WORKER_ARGS,
    replica_count=REPLICA_COUNT,
    machine_type=WORKER_MACHINE_TYPE,
    accelerator_count=PER_MACHINE_ACCELERATOR_COUNT,
    accelerator_type=ACCELERATOR_TYPE,
    reduction_server_count=REDUCTION_SERVER_COUNT,
    reduction_server_machine_type=REDUCTION_SERVER_MACHINE_TYPE,
)

from pprint import pprint
pprint(WORKER_POOL_SPECS)

[{'container_spec': {'args': ['--data-path=gs://rec-bandits-v2-hybrid-vertex-bucket/data',
                              '--bucket_name=rec-bandits-v2-hybrid-vertex-bucket',
                              '--data_gcs_prefix=data',
                              '--data_path=gs://rec-bandits-v2-hybrid-vertex-bucket/data',
                              '--project_number=934903580331',
                              '--batch-size=128',
                              '--rank-k=20',
                              '--num-actions=20',
                              '--tikhonov-weight=0.001',
                              '--agent-alpha=10.0',
                              '--training_loops=200',
                              '--steps-per-loop=6',
                              '--distribute=single',
                              '--artifacts_dir=gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt-v3/run-20231018-151159/artifacts',
                              '--root_dir=gs://rec-bandits-v2-h

### Define parameter spec

Next, define the 1parameter_spec1, which is a dictionary specifying the parameters you want to optimize. The **dictionary key** is the string you assigned to the command line argument for each hyperparameter, and the **dictionary value** is the parameter specification.

For each hyperparameter, you need to define the `Type` as well as the bounds for the values that the tuning service will try. Hyperparameters can be of type `Double`, `Integer`, `Categorical`, or `Discrete`. If you select the type `Double` or `Integer`, you need to provide a minimum and maximum value. And if you select `Categorical` or `Discrete` you need to provide the values. For the `Double` and `Integer` types, you also need to provide the scaling value. Learn more about [Using an Appropriate Scale](https://www.youtube.com/watch?v=cSoK_6Rkbfg).

In [26]:
# Dictionary representing parameters to optimize.
# The dictionary key is the parameter_id, which is passed into your training
# job as a command line argument,
# And the dictionary value is the parameter specification of the metric.
parameter_spec = {
    # "steps-per-loop": hpt.DiscreteParameterSpec(values=[2, 4], scale=None),
    "batch-size": hpt.DiscreteParameterSpec(values=[16, 32, 128], scale=None),
    "num-actions": hpt.DiscreteParameterSpec(values=[8, 24, 32], scale=None),
    # "training-loops": hpt.DiscreteParameterSpec(values=[4, 6, 8], scale=None),
}

The final spec to define is `metric_spec`, which is a dictionary representing the metric to optimize. The dictionary key is the `hyperparameter_metric_tag` that you set in your training application code, and the value is the optimization goal.

In [27]:
# Dictionary representing metrics to optimize.
# The dictionary key is the metric_id, which is reported by your training job,
# And the dictionary value is the optimization goal of the metric.
metric_spec = {"final_average_return": "maximize"}

## [1] Submit (hpt) train job

In [28]:
aiplatform.init(
    project=PROJECT_ID
    , location=REGION
    , experiment=EXPERIMENT_NAME
    # , staging_bucket=ROOT_DIR
)

JOB_NAME = f"mvl-hpt-{RUN_NAME}"
print(f"JOB_NAME: {JOB_NAME}")

JOB_NAME: mvl-hpt-run-20231018-151159


In [29]:
# Create a CustomJob
my_custom_hpt_job = aiplatform.CustomJob(
    display_name=JOB_NAME
    , project=PROJECT_ID
    , worker_pool_specs=WORKER_POOL_SPECS
    , base_output_dir=BASE_OUTPUT_DIR
    , staging_bucket=ROOT_DIR
)

Then, create and run a HyperparameterTuningJob.

> see [source code](https://github.com/googleapis/python-aiplatform/blob/main/google/cloud/aiplatform/hyperparameter_tuning.py)

There are a few arguments to note:

* `max_trial_count`: Sets an upper bound on the number of trials the service will run. The recommended practice is to start with a smaller number of trials and get a sense of how impactful your chosen hyperparameters are before scaling up.

* `parallel_trial_count`: If you use parallel trials, the service provisions multiple training processing clusters. The worker pool spec that you specify when creating the job is used for each individual training cluster. Increasing the number of parallel trials reduces the amount of time the hyperparameter tuning job takes to run; however, it can reduce the effectiveness of the job overall. This is because the default tuning strategy uses results of previous trials to inform the assignment of values in subsequent trials.

* `search_algorithm`: The available search algorithms are grid, random, or default (None). The default option applies Bayesian optimization to search the space of possible hyperparameter values and is the recommended algorithm.

In [30]:
# Create and run HyperparameterTuningJob

hp_job = aiplatform.HyperparameterTuningJob(
    display_name=JOB_NAME,
    custom_job=my_custom_hpt_job,
    metric_spec=metric_spec,
    parameter_spec=parameter_spec,
    max_trial_count=6,
    parallel_trial_count=6,
    project=PROJECT_ID,
    search_algorithm="random",
)

hp_job.run(
    sync=False
    , service_account=VERTEX_SA
    , restart_job_on_worker_restart = False 
    , enable_web_access = True
    , tensorboard = TB_RESOURCE_NAME
)

In [32]:
print(f"Job Name: {hp_job.display_name}")
print(f"Job Resource Name: {hp_job.resource_name}\n")
# print(f"Check training progress at {custom_job._dashboard_uri()}")

Job Name: mvl-hpt-run-20231018-151159
Job Resource Name: projects/934903580331/locations/us-central1/hyperparameterTuningJobs/4194334288109371392



#### Find the best combination(s) hyperparameter(s) for each metric

In [89]:
best_test = (None, None, None, 0.0)
for trial in hp_job.trials:
    # print(trial)
    # Keep track of the best outcome
    if float(trial.final_measurement.metrics[0].value) > best_test[3]:
        # print(trial.final_measurement.metrics[0].value)
        # print(trial.parameters[0].value)
        try:
            best_test = (
                trial.id,
                trial.parameters[0].value.number_value,
                trial.parameters[1].value.number_value,
                trial.final_measurement.metrics[0].value,
            )
        except:
            best_test = (
                trial.id,
                trial.parameters[0].value.number_value,
                None,
                trial.final_measurement.metrics[0].value,
            )

print(best_test)

('3', 16.0, 24.0, 2.032210111618042)


In [93]:
BATCH_SIZE_best = int(best_test[1])
BATCH_SIZE_best

16

In [94]:
NUM_ACTIONS_best = int(best_test[2])
NUM_ACTIONS_best

24

In [96]:
# if hp_job.trials:
#     best_objective_values = dict.fromkeys(
#         [metric.metric_id for metric in hp_job.trials[0].final_measurement.metrics]
#         , -np.inf
#     )
#     best_params = defaultdict(list)
#     for trial in hp_job.trials:
#         for metric in trial.final_measurement.metrics:
#             # here
#             params = {
#                 param.parameter_id: param.value for param in trial.parameters
#             }
#             if metric.value > best_objective_values[metric.metric_id]:
#                 best_params[metric.metric_id] = [params]
#             elif metric.value == best_objective_values[metric.metric_id]:
#                 best_params[param.parameter_id].append(params)  # Handle cases where multiple hyperparameter values lead to the same performance.
#     print("Best hyperparameter value(s):")
#     for metric, params in best_params.items():
#         print(f"Metric={metric}: {sorted(params)}")

#### Convert a combination of best hyperparameter(s) for a metric of interest to JSON

In [97]:
# LOCAL_RESULTS_FILE = "result.json"  # {"batch-size": 8.0, "steps-per-loop": 2.0}

# with open(LOCAL_RESULTS_FILE, "w") as f:
#     json.dump(best_params["final_average_return"][0], f)

#### Upload the best hyperparameter(s) to GCS for use in training

In [98]:
# !gsutil -q cp $LOCAL_RESULTS_FILE $HPTUNING_RESULT_URI

# !gsutil ls $HPTUNING_RESULT_URI

## [2] Submit custom container training job

- Note again that the bucket must be in the same regional location as the service location and it should not be multi-regional.
- Read more of CustomContainerTrainingJob's source code [here](https://github.com/googleapis/python-aiplatform/blob/v0.8.0/google/cloud/aiplatform/training_jobs.py#L2153).
- Like with local execution, you can use TensorBoard Profiler to track the training process and resources, and visualize the corresponding artifacts using the command: `%tensorboard --logdir $PROFILER_DIR`.

In [231]:
EXPERIMENT_NAME   = f'scale-perarm-hpt'

invoke_time       = time.strftime("%Y%m%d-%H%M%S")
RUN_NAME          = f'run-{invoke_time}'

BASE_OUTPUT_DIR   = f'gs://{BUCKET_NAME}/{EXPERIMENT_NAME}/{RUN_NAME}'
LOG_DIR           = f"{BASE_OUTPUT_DIR}/logs"
ROOT_DIR          = f"{BASE_OUTPUT_DIR}/root"                               # Root directory for writing logs/summaries/checkpoints.
ARTIFACTS_DIR     = f"{BASE_OUTPUT_DIR}/artifacts"                          # Where the trained model will be saved and restored.

print(f"EXPERIMENT_NAME   : {EXPERIMENT_NAME}")
print(f"RUN_NAME          : {RUN_NAME}")
print(f"BASE_OUTPUT_DIR   : {BASE_OUTPUT_DIR}")
print(f"LOG_DIR           : {LOG_DIR}")
print(f"ROOT_DIR          : {ROOT_DIR}")
print(f"ARTIFACTS_DIR     : {ARTIFACTS_DIR}")

EXPERIMENT_NAME   : scale-perarm-hpt
RUN_NAME          : run-20231018-203331
BASE_OUTPUT_DIR   : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt/run-20231018-203331
LOG_DIR           : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt/run-20231018-203331/logs
ROOT_DIR          : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt/run-20231018-203331/root
ARTIFACTS_DIR     : gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt/run-20231018-203331/artifacts


In [232]:
# aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_NAME)

In [233]:
RUN_HYPERPARAMETER_TUNING       = False 
TRAIN_WITH_BEST_HYPERPARAMETERS = False   

### set Tensorboard

In [234]:
# # create new TB instance
TENSORBOARD_DISPLAY_NAME=f"{EXPERIMENT_NAME}-{RUN_NAME}"

tensorboard = aiplatform.Tensorboard.create(
    display_name=TENSORBOARD_DISPLAY_NAME
    , project=PROJECT_ID
    , location=REGION
    # , location="asia-southeast1"
)

TB_RESOURCE_NAME = tensorboard.resource_name

# use existing TB instance
# TB_RESOURCE_NAME = 'projects/934903580331/locations/us-central1/tensorboards/6924469145035603968'

print(f"TB_RESOURCE_NAME: {TB_RESOURCE_NAME}")
print(f"TB display name: {tensorboard.display_name}")

TB_RESOURCE_NAME: projects/934903580331/locations/us-central1/tensorboards/3765013686528245760
TB display name: scale-perarm-hpt-run-20231018-203331


### set training args

In [235]:
# Set hyperparameters.
BATCH_SIZE       = BATCH_SIZE_best
TRAINING_LOOPS   = 100
STEPS_PER_LOOP   = 1
NUM_ACTIONS      = NUM_ACTIONS_best

CHKPT_INTERVAL       = TRAINING_LOOPS // 5

print(f"BATCH_SIZE     : {BATCH_SIZE}")
print(f"TRAINING_LOOPS : {TRAINING_LOOPS}")
print(f"STEPS_PER_LOOP : {STEPS_PER_LOOP}")
print(f"NUM_ACTIONS    : {NUM_ACTIONS}")
print(f"CHKPT_INTERVAL : {CHKPT_INTERVAL}")

BATCH_SIZE     : 16
TRAINING_LOOPS : 100
STEPS_PER_LOOP : 1
NUM_ACTIONS    : 24
CHKPT_INTERVAL : 20


In [236]:
WORKER_ARGS = [
    f"--data-path={DATA_PATH}"               # TODO - remove duplicate arg
    , f"--bucket_name={BUCKET_NAME}"
    , f"--data_gcs_prefix={DATA_GCS_PREFIX}"
    , f"--data_path={DATA_PATH}"
    , f"--project_number={PROJECT_NUM}"
    , f"--batch-size={BATCH_SIZE}"
    , f"--rank-k={RANK_K}"
    , f"--num-actions={NUM_ACTIONS_best}"
    , f"--tikhonov-weight={TIKHONOV_WEIGHT}"
    , f"--agent-alpha={AGENT_ALPHA}"
    , f"--training_loops={TRAINING_LOOPS}"
    , f"--steps-per-loop={STEPS_PER_LOOP}"
    , f"--distribute={DISTRIBUTE_STRATEGY}"
    , f"--artifacts_dir={ARTIFACTS_DIR}"
    , f"--root_dir={ROOT_DIR}"
    , f"--log_dir={LOG_DIR}"
    , f"--experiment_name={EXPERIMENT_NAME}"
    , f"--experiment_run={RUN_NAME}"
    , f"--tf_gpu_thread_count={TF_GPU_THREAD_COUNT}"
    , f"--chkpt_interval={CHKPT_INTERVAL}"
    , f"--profiler"
    # , f"--sum_grads_vars"
    , f"--debug_summaries"
    , f"--use_gpu"
    # , f"--use_tpu"
]

if RUN_HYPERPARAMETER_TUNING:
    WORKER_ARGS.append("--run-hyperparameter-tuning")
elif TRAIN_WITH_BEST_HYPERPARAMETERS:
    WORKER_ARGS.append("--train-with-best-hyperparameters")
    WORKER_ARGS.append(f"--best-hyperparameters-bucket={BUCKET_NAME}")
    WORKER_ARGS.append(f"--best-hyperparameters-path={HPTUNING_RESULT_PATH}")
    
from src.per_arm_rl import train_utils

WORKER_POOL_SPECS = train_utils.prepare_worker_pool_specs(
    image_uri=f"{IMAGE_URI_01}:latest",
    args=WORKER_ARGS,
    replica_count=REPLICA_COUNT,
    machine_type=WORKER_MACHINE_TYPE,
    accelerator_count=PER_MACHINE_ACCELERATOR_COUNT,
    accelerator_type=ACCELERATOR_TYPE,
    reduction_server_count=REDUCTION_SERVER_COUNT,
    reduction_server_machine_type=REDUCTION_SERVER_MACHINE_TYPE,
)

from pprint import pprint
pprint(WORKER_POOL_SPECS)

[{'container_spec': {'args': ['--data-path=gs://rec-bandits-v2-hybrid-vertex-bucket/data',
                              '--bucket_name=rec-bandits-v2-hybrid-vertex-bucket',
                              '--data_gcs_prefix=data',
                              '--data_path=gs://rec-bandits-v2-hybrid-vertex-bucket/data',
                              '--project_number=934903580331',
                              '--batch-size=16',
                              '--rank-k=20',
                              '--num-actions=24',
                              '--tikhonov-weight=0.001',
                              '--agent-alpha=10.0',
                              '--training_loops=100',
                              '--steps-per-loop=1',
                              '--distribute=single',
                              '--artifacts_dir=gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt/run-20231018-203331/artifacts',
                              '--root_dir=gs://rec-bandits-v2-hybri

In [237]:
aiplatform.init(
    project=PROJECT_ID
    , location=REGION
    , experiment=EXPERIMENT_NAME
    # , staging_bucket=ROOT_DIR
)

JOB_NAME = f"mvl-best-train-{RUN_NAME}"
print(f"JOB_NAME: {JOB_NAME}")

JOB_NAME: mvl-best-train-run-20231018-203331


In [238]:
# Create a CustomJob
job = aiplatform.CustomJob(
    display_name=JOB_NAME
    , project=PROJECT_ID
    , worker_pool_specs=WORKER_POOL_SPECS
    , base_output_dir=BASE_OUTPUT_DIR
    , staging_bucket=ROOT_DIR
    # , location="asia-southeast1" # TODO
)

In [239]:
job.run(
    tensorboard=TB_RESOURCE_NAME,
    service_account=VERTEX_SA,
    restart_job_on_worker_restart=False,
    enable_web_access=True,
    sync=False,
)

### TensorBoard Profiler

In [240]:
# os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# import tensorflow as tf

TB_LOGS_PATH = f'{LOG_DIR}' # 
print(f"TB_LOGS_PATH: {TB_LOGS_PATH}")

TB_LOGS_PATH: gs://rec-bandits-v2-hybrid-vertex-bucket/scale-perarm-hpt/run-20231018-203331/logs


In [243]:
# %load_ext tensorboard
%reload_ext tensorboard

In [245]:
%tensorboard --logdir=$TB_LOGS_PATH

## Making predictions

* When a policy is trained, given a new observation request (i.e. a user vector), 
* the policy will inference (produce) actions, which are the recommended movies. 
* In TF-Agents, observations are abstracted in a named tuple,

```
TimeStep(‘step_type’, ‘discount’, ‘reward’, ‘observation’)
```

* the policy map time steps to actions

In [None]:
# ARTIFACTS_DIR = "gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230717-211248/model"

!gsutil ls $ARTIFACTS_DIR

In [97]:
trained_policy = tf.saved_model.load(MODEL_DIR)
trained_policy

<tensorflow.python.saved_model.load.Loader._recreate_base_user_object.<locals>._UserObject at 0x7fa450365570>

In [100]:
# train_step = trained_policy.get_train_step()
# print('Loaded policy at step: %d', train_step.numpy())

Loaded policy at step: %d -1


In [170]:
from src.per_arm_rl import my_per_arm_py_env as my_per_arm_py_env

env = my_per_arm_py_env.MyMovieLensPerArmPyEnvironment(
    project_number = PROJECT_NUM
    , data_path = DATA_PATH
    , bucket_name = BUCKET_NAME
    , data_gcs_prefix = DATA_GCS_PREFIX
    , user_age_lookup_dict = data_config.USER_AGE_LOOKUP
    , user_occ_lookup_dict = data_config.USER_OCC_LOOKUP
    , movie_gen_lookup_dict = data_config.MOVIE_GEN_LOOKUP
    , num_users = data_config.MOVIELENS_NUM_USERS
    , num_movies = data_config.MOVIELENS_NUM_MOVIES
    , rank_k = RANK_K
    , batch_size = BATCH_SIZE
    , num_actions = NUM_ACTIONS
)

environment = tf_py_environment.TFPyEnvironment(env)

In [177]:
observation_array = environment._observe()

time_step = tf_agents.trajectories.restart(
    observation=observation_array,
    batch_size=tf.convert_to_tensor([BATCH_SIZE]),
)

action_step = trained_policy.action(time_step)

action_step.action.numpy().tolist()

[2, 9, 11, 0, 12, 5, 7, 13]

### debugging

In [178]:
observation_array

{'global': <tf.Tensor: shape=(8, 22), dtype=float32, numpy=
 array([[-2.93424092e-02,  1.21775474e-02,  3.43246460e-02,
         -3.27415764e-02, -1.81793012e-02, -3.88447121e-02,
          5.79259545e-03, -2.34962348e-02, -3.24628595e-03,
          3.40082720e-02, -2.29820684e-02, -6.14768942e-04,
          3.24703840e-04,  6.10099500e-03,  2.94740163e-02,
         -1.70867296e-03, -1.08436979e-02,  4.27724496e-02,
          6.29528239e-02, -2.03116778e-02,  6.00010014e+00,
          1.50001001e+01],
        [-5.58640324e-02,  3.56975570e-02, -5.23380339e-02,
         -3.14981118e-02,  4.17792089e-02,  4.70291600e-02,
          6.71838149e-02, -1.15915332e-02, -6.09778147e-03,
         -7.18918741e-02, -9.12436005e-03, -1.62272230e-02,
          4.79080118e-02, -7.91483298e-02,  3.40030827e-02,
         -3.72561365e-02,  2.77148210e-03,  5.98525517e-02,
         -5.55859469e-02,  1.00681789e-01,  1.00010002e+00,
          7.00010014e+00],
        [-1.25767207e-02, -1.81108192e-02,  3.

In [179]:
time_step

TimeStep(
{'discount': <tf.Tensor: shape=(8,), dtype=float32, numpy=array([1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)>,
 'observation': {'global': <tf.Tensor: shape=(8, 22), dtype=float32, numpy=
array([[-2.93424092e-02,  1.21775474e-02,  3.43246460e-02,
        -3.27415764e-02, -1.81793012e-02, -3.88447121e-02,
         5.79259545e-03, -2.34962348e-02, -3.24628595e-03,
         3.40082720e-02, -2.29820684e-02, -6.14768942e-04,
         3.24703840e-04,  6.10099500e-03,  2.94740163e-02,
        -1.70867296e-03, -1.08436979e-02,  4.27724496e-02,
         6.29528239e-02, -2.03116778e-02,  6.00010014e+00,
         1.50001001e+01],
       [-5.58640324e-02,  3.56975570e-02, -5.23380339e-02,
        -3.14981118e-02,  4.17792089e-02,  4.70291600e-02,
         6.71838149e-02, -1.15915332e-02, -6.09778147e-03,
        -7.18918741e-02, -9.12436005e-03, -1.62272230e-02,
         4.79080118e-02, -7.91483298e-02,  3.40030827e-02,
        -3.72561365e-02,  2.77148210e-03,  5.98525517e-02,
      

In [181]:
# action_step

In [171]:
observation_array = environment._observe()
observation_array

{'global': <tf.Tensor: shape=(8, 22), dtype=float32, numpy=
 array([[-4.7207270e-02,  3.4380693e-02,  3.2353580e-03,  2.4785951e-02,
         -2.1616893e-02,  2.0435546e-02, -1.6932882e-02, -4.6546906e-02,
          7.4772812e-02,  3.8104949e-03, -4.0298931e-02, -1.5616004e-02,
         -2.8407177e-02,  1.3371499e-01, -1.3817169e-02,  7.9589628e-02,
          2.9432276e-02,  1.3362209e-02,  3.4628764e-02,  1.5013713e-02,
          3.0000999e+00,  1.8000099e+01],
        [-3.1404246e-02,  1.3404934e-02,  8.3384868e-03, -7.0392655e-04,
          1.5290237e-03, -4.3489054e-02,  6.4241186e-02,  1.3640903e-02,
          7.7757924e-03,  9.0712858e-03, -5.2332234e-02, -2.8096840e-02,
         -3.3365856e-03,  2.2077200e-03,  2.4335524e-02,  5.3830449e-02,
         -2.3229025e-02, -2.5175557e-02,  4.3426618e-02,  9.7907074e-03,
          1.0001000e+00,  1.8000099e+01],
        [-4.3206816e-03, -2.4147367e-02,  2.4132678e-02, -3.6364350e-02,
         -3.6980886e-02,  5.2587246e-03,  6.5641068e-

In [173]:
time_step = tf_agents.trajectories.restart(
    observation=observation_array,
    batch_size=tf.convert_to_tensor([BATCH_SIZE]),
)
time_step

TimeStep(
{'discount': <tf.Tensor: shape=(8,), dtype=float32, numpy=array([1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)>,
 'observation': {'global': <tf.Tensor: shape=(8, 22), dtype=float32, numpy=
array([[-4.7207270e-02,  3.4380693e-02,  3.2353580e-03,  2.4785951e-02,
        -2.1616893e-02,  2.0435546e-02, -1.6932882e-02, -4.6546906e-02,
         7.4772812e-02,  3.8104949e-03, -4.0298931e-02, -1.5616004e-02,
        -2.8407177e-02,  1.3371499e-01, -1.3817169e-02,  7.9589628e-02,
         2.9432276e-02,  1.3362209e-02,  3.4628764e-02,  1.5013713e-02,
         3.0000999e+00,  1.8000099e+01],
       [-3.1404246e-02,  1.3404934e-02,  8.3384868e-03, -7.0392655e-04,
         1.5290237e-03, -4.3489054e-02,  6.4241186e-02,  1.3640903e-02,
         7.7757924e-03,  9.0712858e-03, -5.2332234e-02, -2.8096840e-02,
        -3.3365856e-03,  2.2077200e-03,  2.4335524e-02,  5.3830449e-02,
        -2.3229025e-02, -2.5175557e-02,  4.3426618e-02,  9.7907074e-03,
         1.0001000e+00,  1.8000099e+01

In [174]:
action_step = trained_policy.action(time_step)
action_step

PolicyStep(action=<tf.Tensor: shape=(8,), dtype=int32, numpy=array([11, 10, 10,  0, 14, 10,  4, 11], dtype=int32)>, state=(), info=PerArmPolicyInfo(log_probability=(), predicted_rewards_mean=(), multiobjective_scalarized_predicted_rewards_mean=(), predicted_rewards_optimistic=(), predicted_rewards_sampled=(), bandit_policy_type=(), chosen_arm_features=<tf.Tensor: shape=(8, 21), dtype=float32, numpy=
array([[-7.54112229e-02, -7.72761703e-02, -3.46641243e-02,
         2.24941876e-02,  5.09656742e-02, -1.24314584e-01,
         8.77733678e-02,  6.58942834e-02,  5.93432225e-02,
        -9.74757150e-02, -7.19200298e-02, -4.67373803e-02,
         1.61329145e-03, -1.49870683e-02,  1.62771083e-02,
        -9.17898342e-02, -3.04824840e-02, -7.57825375e-02,
        -1.04961675e-02, -4.03923541e-02,  7.00010014e+00],
       [-5.49132079e-02,  6.67439029e-02,  6.70201927e-02,
        -6.18030084e-03,  5.20040421e-03, -3.73893976e-02,
        -3.29198316e-02,  4.10452299e-02, -3.06791905e-02,
      

In [175]:
action_step.action.numpy().tolist()

[11, 10, 10, 0, 14, 10, 4, 11]

## Archive

In [101]:
from tf_agents.bandits.specs import utils as bandit_spec_utils

GLOBAL_KEY = bandit_spec_utils.GLOBAL_FEATURE_KEY
PER_ARM_KEY = bandit_spec_utils.PER_ARM_FEATURE_KEY

# observation = {'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='observation/per_arm'), 'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='observation/global')}
# observation

print(f"GLOBAL_KEY  : {GLOBAL_KEY}")
print(f"PER_ARM_KEY : {PER_ARM_KEY}")

GLOBAL_KEY  : global
PER_ARM_KEY : per_arm


In [None]:
# batched_observations = {
#     GLOBAL_KEY:
#         tf.convert_to_tensor(combined_user_features, dtype=tf.float32),
#     PER_ARM_KEY:
#         tf.convert_to_tensor(current_movies, dtype=tf.float32),
# }

# print(f"batched_observations  : {batched_observations}")

In [104]:
observation = {
    GLOBAL_KEY:
        np.zeros([BATCH_SIZE, RANK_K + 2], dtype=np.int32), #making space like above for dimensions
    PER_ARM_KEY:
        np.zeros([BATCH_SIZE, NUM_ACTIONS, RANK_K + 1], dtype=np.int32),
}

print(f"observation  : {observation}")

observation  : {'global': array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
      dtype=int32), 'per_arm': array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0,

In [161]:
from tf_agents.specs import array_spec

observation_spec = {
    GLOBAL_KEY:
        array_spec.ArraySpec(shape=[RANK_K + 2], dtype=np.float32, name='observation/global'),     # creating +space for user age and occupation
    PER_ARM_KEY:
        array_spec.ArraySpec(
            shape=[NUM_ACTIONS, RANK_K + 1], dtype=np.float32, name='per_arm/global'),   # creating +1 space for movie genre
}

print(f"observation_spec  : {observation_spec}")

observation_spec  : {'global': ArraySpec(shape=(22,), dtype=dtype('float32'), name='observation/global'), 'per_arm': ArraySpec(shape=(20, 21), dtype=dtype('float32'), name='per_arm/global')}


In [None]:
from tf_agents.specs import tensor_spec

In [162]:
reward_spec = {
    "reward": array_spec.ArraySpec(shape=[BATCH_SIZE], dtype=np.float32, name="reward")
}

discount_spec = {
    "discount": array_spec.ArraySpec(shape=[BATCH_SIZE], dtype=np.float32, name="discount")
}

step_type_spec = {
    "step_type": array_spec.ArraySpec(shape=[BATCH_SIZE], dtype=np.int32, name="step_type")
}

print(f"reward_spec  : {reward_spec}")
print(f"discount_spec  : {discount_spec}")
print(f"step_type_spec  : {step_type_spec}")

reward_spec  : {'reward': ArraySpec(shape=(8,), dtype=dtype('float32'), name='reward')}
discount_spec  : {'discount': ArraySpec(shape=(8,), dtype=dtype('float32'), name='discount')}
step_type_spec  : {'step_type': ArraySpec(shape=(8,), dtype=dtype('int32'), name='step_type')}


In [163]:
from tensorflow.python.framework import tensor_spec as ts  # TF internal

TensorSpec = tf.TensorSpec
BoundedTensorSpec = ts.BoundedTensorSpec

def is_bounded(spec):
    if isinstance(spec, (array_spec.BoundedArraySpec, BoundedTensorSpec)):
        return True
    elif hasattr(spec, "minimum") and hasattr(spec, "maximum"):
        return hasattr(spec, "dtype") and hasattr(spec, "shape")

def from_spec(spec):
    """
    Maps the given spec into corresponding TensorSpecs keeping bounds.
    """

    def _convert_to_tensor_spec(s):
        # Need to check bounded first as non bounded specs are base class.
        if isinstance(s, tf.TypeSpec):
            return s
        if is_bounded(s):
            return BoundedTensorSpec.from_spec(s)
        elif isinstance(s, array_spec.ArraySpec):
            return TensorSpec.from_spec(s)
        else:
            raise ValueError(
                "No known conversion from type `%s` to a TensorSpec.  Saw:\n  %s" % (type(s), s)
            )

    return tf.nest.map_structure(_convert_to_tensor_spec, spec)

In [164]:
obs_tensor_spec = from_spec(observation_spec)
discount_tensor_spec = from_spec(discount_spec)
reward_tensor_spec = from_spec(reward_spec) 
step_type_tensor_spec = from_spec(step_type_spec) 

print(f"obs_tensor_spec       : {obs_tensor_spec}")
print(f"discount_tensor_spec  : {discount_tensor_spec}")
print(f"reward_tensor_spec    : {reward_tensor_spec}")
print(f"step_type_tensor_spec : {step_type_tensor_spec}")

obs_tensor_spec       : {'global': TensorSpec(shape=(22,), dtype=tf.float32, name='observation/global'), 'per_arm': TensorSpec(shape=(20, 21), dtype=tf.float32, name='per_arm/global')}
discount_tensor_spec  : {'discount': TensorSpec(shape=(8,), dtype=tf.float32, name='discount')}
reward_tensor_spec    : {'reward': TensorSpec(shape=(8,), dtype=tf.float32, name='reward')}
step_type_tensor_spec : {'step_type': TensorSpec(shape=(8,), dtype=tf.int32, name='step_type')}


In [None]:
# from tf_agents.trajectories import time_step as ts

# time_step = ts.TimeStep(
#     step_type=step_type,
#     observation=observation,
#     reward=tf.zeros_like(step_type, tf.float32),
#     discount=tf.ones_like(step_type, tf.float32))

In [165]:
from tf_agents.trajectories import time_step as ts
from tf_agents.trajectories import trajectory

traj = trajectory.Trajectory(
    step_type=step_type_tensor_spec,
    observation=obs_tensor_spec,
    action=actions,
    policy_info=(),
    next_step_type=next_step_types,
    reward=reward_tensor_spec,
    discount=discount_tensor_spec
)
traj

Trajectory(
{'action': <tf.Tensor: shape=(8,), dtype=float32, numpy=array([1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)>,
 'discount': {'discount': TensorSpec(shape=(8,), dtype=tf.float32, name='discount')},
 'next_step_type': <tf.Tensor: shape=(8,), dtype=int32, numpy=array([1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)>,
 'observation': {'global': TensorSpec(shape=(22,), dtype=tf.float32, name='observation/global'),
                 'per_arm': TensorSpec(shape=(20, 21), dtype=tf.float32, name='per_arm/global')},
 'policy_info': (),
 'reward': {'reward': TensorSpec(shape=(8,), dtype=tf.float32, name='reward')},
 'step_type': {'step_type': TensorSpec(shape=(8,), dtype=tf.int32, name='step_type')}})

In [166]:
traj.step_type

{'step_type': TensorSpec(shape=(8,), dtype=tf.int32, name='step_type')}

In [167]:
time_step = ts.TimeStep(
    step_type=traj.step_type,
    observation=traj.observation,
    reward=traj.reward,
    discount=traj.discount
)

time_step

TimeStep(
{'discount': {'discount': TensorSpec(shape=(8,), dtype=tf.float32, name='discount')},
 'observation': {'global': TensorSpec(shape=(22,), dtype=tf.float32, name='observation/global'),
                 'per_arm': TensorSpec(shape=(20, 21), dtype=tf.float32, name='per_arm/global')},
 'reward': {'reward': TensorSpec(shape=(8,), dtype=tf.float32, name='reward')},
 'step_type': {'step_type': TensorSpec(shape=(8,), dtype=tf.int32, name='step_type')}})

In [None]:
time_step.restart()

In [168]:
action_step = trained_policy.action(time_step)
action_step

ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * TimeStep(
{'discount': {'discount': <tf.Tensor 'discount:0' shape=(8,) dtype=float32>},
 'observation': {'global': <tf.Tensor 'observation/global:0' shape=(22,) dtype=float32>,
                 'per_arm': <tf.Tensor 'per_arm/global:0' shape=(20, 21) dtype=float32>},
 'reward': {'reward': <tf.Tensor 'reward:0' shape=(8,) dtype=float32>},
 'step_type': {'step_type': <tf.Tensor 'step_type:0' shape=(8,) dtype=int32>}})
    * ()
  Keyword arguments: {}

 Expected these arguments to match one of the following 2 option(s):

Option 1:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='observation/global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='observation/per_arm')})
    * ()
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='time_step_step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='time_step_observation_global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='time_step_observation_per_arm')})
    * ()
  Keyword arguments: {}

In [132]:
# # discounts = tf.ones((BATCH_SIZE,), dtype=tf.float32)
# discounts = list(np.ones((BATCH_SIZE,), dtype=np.float32))
# discounts

In [144]:
from tf_agents.trajectories import time_step as ts
from tf_agents.trajectories import trajectory
# from tf_agents.typing import types

discounts = tf.ones((BATCH_SIZE,), dtype=tf.float32)
rewards = tf.ones((BATCH_SIZE,), dtype=tf.float32)
actions = tf.ones((BATCH_SIZE,), dtype=tf.float32)

next_step_types = tf.ones((BATCH_SIZE,), dtype=tf.int32) * ts.StepType.MID

step_types = tf.concat([[ts.StepType.FIRST], next_step_types[1:]], axis=0)

traj = trajectory.Trajectory(
    step_type=step_types,
    observation=observation, #obs_tensor_spec,
    action=actions,
    policy_info=(),
    next_step_type=next_step_types,
    reward=rewards,
    discount=discounts
)
traj

Trajectory(
{'action': <tf.Tensor: shape=(8,), dtype=float32, numpy=array([1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)>,
 'discount': <tf.Tensor: shape=(8,), dtype=float32, numpy=array([1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32)>,
 'next_step_type': <tf.Tensor: shape=(8,), dtype=int32, numpy=array([1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)>,
 'observation': {'global': array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
      dtype=int32),
   

In [151]:
step_type = tf.expand_dims(traj.step_type, axis=0)

# step_type_tensor_spec = from_spec(step_type) 

step_type

<tf.Tensor: shape=(1, 8), dtype=int32, numpy=array([[0, 1, 1, 1, 1, 1, 1, 1]], dtype=int32)>

In [146]:
traj_observation = tf.nest.map_structure(lambda x: tf.expand_dims(x, axis=0),traj.observation)
traj_observation

{'global': <tf.Tensor: shape=(1, 8, 22), dtype=int32, numpy=
 array([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0],
         [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
          0]]], dtype=int32)>,
 'per_arm': <tf.Tensor: shape=(1, 8, 20, 21), dtype=int32, numpy=
 array([[[[0, 0, 0, ..., 0, 0, 0],
          [0, 0, 0, ..., 0, 0, 0],
          [0, 0, 0, ..., 0, 0, 0],
          ...,
          [0, 0, 0, ..., 0, 0, 0],
     

In [147]:
time_step = ts.TimeStep(
    step_type=step_type,
    observation=traj_observation,
    reward=tf.zeros_like(step_type, tf.float32),
    discount=tf.ones_like(step_type, tf.float32))

time_step

TimeStep(
{'discount': <tf.Tensor: shape=(1, 8), dtype=float32, numpy=array([[1., 1., 1., 1., 1., 1., 1., 1.]], dtype=float32)>,
 'observation': {'global': <tf.Tensor: shape=(1, 8, 22), dtype=int32, numpy=
array([[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0],
        [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0]]], dtype=int32)>,
                 'per_arm': <tf.Tensor: shape=(1, 8, 20, 21), dtype=int32, numpy=
array([[[[0, 0,

In [149]:
action_step = trained_policy.action(time_step)
action_step

ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * TimeStep(
{'discount': <tf.Tensor 'time_step_2:0' shape=(1, 8) dtype=float32>,
 'observation': {'global': <tf.Tensor 'time_step_3:0' shape=(1, 8, 22) dtype=int32>,
                 'per_arm': <tf.Tensor 'time_step_4:0' shape=(1, 8, 20, 21) dtype=int32>},
 'reward': <tf.Tensor 'time_step_1:0' shape=(1, 8) dtype=float32>,
 'step_type': <tf.Tensor 'time_step:0' shape=(1, 8) dtype=int32>})
    * ()
  Keyword arguments: {}

 Expected these arguments to match one of the following 2 option(s):

Option 1:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='observation/global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='observation/per_arm')})
    * ()
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='time_step_step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='time_step_observation_global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='time_step_observation_per_arm')})
    * ()
  Keyword arguments: {}

In [116]:
time_step = tf_agents.trajectories.restart(
    observation=tensor_spec,
    batch_size=tf.convert_to_tensor([BATCH_SIZE]),
)
time_step

TimeStep(
{'discount': array([1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32),
 'observation': {'global': TensorSpec(shape=(22,), dtype=tf.float32, name=None),
                 'per_arm': TensorSpec(shape=(20, 21), dtype=tf.float32, name=None)},
 'reward': array([0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32),
 'step_type': array([0, 0, 0, 0, 0, 0, 0, 0], dtype=int32)})

In [117]:
action_step = trained_policy.action(time_step)
action_step

ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * TimeStep(
{'discount': <tf.Tensor 'time_step_2:0' shape=(8,) dtype=float32>,
 'observation': {'global': <tf.Tensor 'time_step_3:0' shape=(22,) dtype=float32>,
                 'per_arm': <tf.Tensor 'time_step_4:0' shape=(20, 21) dtype=float32>},
 'reward': <tf.Tensor 'time_step_1:0' shape=(8,) dtype=float32>,
 'step_type': <tf.Tensor 'time_step:0' shape=(8,) dtype=int32>})
    * ()
  Keyword arguments: {}

 Expected these arguments to match one of the following 2 option(s):

Option 1:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='observation/global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='observation/per_arm')})
    * ()
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='time_step_step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='time_step_observation_global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='time_step_observation_per_arm')})
    * ()
  Keyword arguments: {}

In [110]:
time_step = tf_agents.trajectories.restart(
    observation=observation,
    batch_size=tf.convert_to_tensor([BATCH_SIZE]),
)
time_step

TimeStep(
{'discount': array([1., 1., 1., 1., 1., 1., 1., 1.], dtype=float32),
 'observation': {'global': array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]],
      dtype=int32),
                 'per_arm': array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]],

       [[0, 0, 0, ..., 0, 0, 0],

In [111]:
action_step = trained_policy.action(time_step)
action_step

ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * TimeStep(
{'discount': <tf.Tensor 'time_step_2:0' shape=(8,) dtype=float32>,
 'observation': {'global': <tf.Tensor 'time_step_3:0' shape=(8, 22) dtype=int32>,
                 'per_arm': <tf.Tensor 'time_step_4:0' shape=(8, 20, 21) dtype=int32>},
 'reward': <tf.Tensor 'time_step_1:0' shape=(8,) dtype=float32>,
 'step_type': <tf.Tensor 'time_step:0' shape=(8,) dtype=int32>})
    * ()
  Keyword arguments: {}

 Expected these arguments to match one of the following 2 option(s):

Option 1:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='observation/global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='observation/per_arm')})
    * ()
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='time_step_step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='time_step_observation_global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='time_step_observation_per_arm')})
    * ()
  Keyword arguments: {}

In [None]:
predictions = action_step.action.numpy().tolist()
predictions

In [61]:
instances=[
    {"observation": [list(np.ones(RANK_K)) for _ in range(BATCH_SIZE)]},
]
# instances

In [63]:
import tf_agents

predictions = []
for index, instance in enumerate(instances):
    # Unpack request body and reconstruct TimeStep. Rewards default to 0.
    batch_size = len(instance["observation"])
    print(f"batch_size: {batch_size}")

    time_step = tf_agents.trajectories.restart(
        observation=instance["observation"]
        , batch_size=tf.convert_to_tensor([batch_size])
    )
    policy_step = loaded_model.action(time_step)

    predictions.append(
        {f"PolicyStep {index}": policy_step.action.numpy().tolist()}
    )
    
predictions

# Option 1:
#   Positional arguments (2 total):
#     * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), 
# reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), 
# discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), 
# observation={'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='observation/per_arm'), 'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='observation/global')})
#     * ()
#   Keyword arguments: {}

batch_size: 8


ValueError: Could not find matching concrete function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * TimeStep(
{'discount': <tf.Tensor 'time_step_2:0' shape=(8,) dtype=float32>,
 'observation': [[<tf.Tensor 'time_step_3:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_4:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_5:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_6:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_7:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_8:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_9:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_10:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_11:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_12:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_13:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_14:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_15:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_16:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_17:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_18:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_19:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_20:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_21:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_22:0' shape=() dtype=float64>],
                 [<tf.Tensor 'time_step_23:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_24:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_25:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_26:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_27:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_28:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_29:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_30:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_31:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_32:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_33:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_34:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_35:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_36:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_37:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_38:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_39:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_40:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_41:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_42:0' shape=() dtype=float64>],
                 [<tf.Tensor 'time_step_43:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_44:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_45:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_46:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_47:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_48:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_49:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_50:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_51:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_52:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_53:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_54:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_55:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_56:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_57:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_58:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_59:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_60:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_61:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_62:0' shape=() dtype=float64>],
                 [<tf.Tensor 'time_step_63:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_64:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_65:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_66:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_67:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_68:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_69:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_70:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_71:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_72:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_73:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_74:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_75:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_76:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_77:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_78:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_79:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_80:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_81:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_82:0' shape=() dtype=float64>],
                 [<tf.Tensor 'time_step_83:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_84:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_85:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_86:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_87:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_88:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_89:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_90:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_91:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_92:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_93:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_94:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_95:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_96:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_97:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_98:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_99:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_100:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_101:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_102:0' shape=() dtype=float64>],
                 [<tf.Tensor 'time_step_103:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_104:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_105:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_106:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_107:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_108:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_109:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_110:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_111:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_112:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_113:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_114:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_115:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_116:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_117:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_118:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_119:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_120:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_121:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_122:0' shape=() dtype=float64>],
                 [<tf.Tensor 'time_step_123:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_124:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_125:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_126:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_127:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_128:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_129:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_130:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_131:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_132:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_133:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_134:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_135:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_136:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_137:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_138:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_139:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_140:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_141:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_142:0' shape=() dtype=float64>],
                 [<tf.Tensor 'time_step_143:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_144:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_145:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_146:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_147:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_148:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_149:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_150:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_151:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_152:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_153:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_154:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_155:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_156:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_157:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_158:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_159:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_160:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_161:0' shape=() dtype=float64>,
                  <tf.Tensor 'time_step_162:0' shape=() dtype=float64>]],
 'reward': <tf.Tensor 'time_step_1:0' shape=(8,) dtype=float32>,
 'step_type': <tf.Tensor 'time_step:0' shape=(8,) dtype=int32>})
    * ()
  Keyword arguments: {}

 Expected these arguments to match one of the following 2 option(s):

Option 1:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='discount'), observation={'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='observation/per_arm'), 'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='observation/global')})
    * ()
  Keyword arguments: {}

Option 2:
  Positional arguments (2 total):
    * TimeStep(step_type=TensorSpec(shape=(None,), dtype=tf.int32, name='time_step_step_type'), reward=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_reward'), discount=TensorSpec(shape=(None,), dtype=tf.float32, name='time_step_discount'), observation={'global': TensorSpec(shape=(None, 22), dtype=tf.float32, name='time_step_observation_global'), 'per_arm': TensorSpec(shape=(None, 20, 21), dtype=tf.float32, name='time_step_observation_per_arm')})
    * ()
  Keyword arguments: {}

In [None]:
# @app.post(os.environ["AIP_PREDICT_ROUTE"])
async def predict(request: Request):
    """
    Handles prediction requests.

    Unpacks observations in prediction requests and queries the trained policy for
    predicted actions.

    Args:
      request: Incoming prediction requests that contain observations.

    Returns:
      A dict with the key `predictions` mapping to a list of predicted actions
      corresponding to each observation in the prediction request.
    """
    body = await request.json()
    instances = body["instances"]

    predictions = []
    for index, instance in enumerate(instances):
        # Unpack request body and reconstruct TimeStep. Rewards default to 0.
        batch_size = len(instance["observation"])
        
        time_step = tf_agents.trajectories.restart(
            observation=instance["observation"]
            , batch_size=tf.convert_to_tensor([batch_size])
        )
        policy_step = _model.action(time_step)

        predictions.append(
            {f"PolicyStep {index}": policy_step.action.numpy().tolist()}
        )

    return {
        "predictions": predictions
    }

In [142]:
# endpoint = model.deploy(machine_type="n1-standard-4")

In [143]:
# print("Endpoint display name:", endpoint.display_name)
# print("Endpoint ID:", endpoint.name)

Endpoint display name: mabv1-perarm-model_endpoint
Endpoint ID: 1696656392821145600


### Predict on the Endpoint
- Put prediction input(s) into a list named `instances`. The observation should of dimension (BATCH_SIZE, RANK_K). Read more about the MovieLens simulation environment observation [here](https://github.com/tensorflow/agents/blob/v0.8.0/tf_agents/bandits/environments/movielens_py_environment.py#L32-L138).
- Read more about the endpoint prediction API [here](https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/predict).

In [147]:
## TODO

# endpoint.predict(
#     instances=[
#         {"observation": [list(np.ones(21)) for _ in range(8)]},
#     ]
# )

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
# # Delete endpoint resource
# ! gcloud ai endpoints delete $endpoint.name --quiet --region $REGION

# # Delete model resource
# ! gcloud ai models delete $model.name --quiet

# # Delete Cloud Storage objects that were created
# ! gsutil -m rm -r $ARTIFACTS_DIR