# Scale per-arm Banidt training with Vertex AI

## Overview

### Notebook Objectives:
* Create hyperparameter tuning and training custom container
* Submit hyperparameter tuning job (optional)
* Create custom prediction container
* Submit custom container training job
* Deploy trained model to Endpoint
* Predict on the Endpoint

### TODO - fix vars -Create hyperparameter tuning and training custom container

Create a custom container that can be used for both hyperparameter tuning and training. The associated source code is in `src/training/`. This serves as the inner script of the custom container.
As before, the training function is the same as [trainer.train](https://github.com/tensorflow/agents/blob/r0.8.0/tf_agents/bandits/agents/examples/v2/trainer.py#L104), but it keeps track of intermediate metric values, supports hyperparameter tuning, and (for training) saves artifacts to different locations. The training logic for hyperparameter tuning and training is the same.

#### Execute hyperparameter tuning:
- The code does not save model artifacts. It takes in command-line arguments as hyperparameter values from the Vertex AI Hyperparameter Tuning service, and reports training result metric to Vertex AI at each trial using cloudml-hypertune.
- Note that if you decide to save model artifacts, saving them to the same directory may cause overwriting errors if you use parallel trials in the hyperparameter tuning job. The recommended approach is to save each trial's artifacts to a different sub-directory. This would also allow you to recover all the artifacts from different trials and can potentially save you from re-training.
- Read more about hyperparameter tuning for custom containers [here](https://cloud.google.com/vertex-ai/docs/training/containers-overview#hyperparameter_tuning_with_custom_containers); read about hyperparameter tuning support [here](https://cloud.google.com/vertex-ai/docs/training/hyperparameter-tuning-overview).

#### Execute training:
- The code saves model artifacts to `os.environ["AIP_MODEL_DIR"]` in addition to `ARTIFACTS_DIR`, as required [here](https://github.com/googleapis/python-aiplatform/blob/v0.8.0/google/cloud/aiplatform/training_jobs.py#L2202).
- If you want to make changes to the function, make sure to still save the trained policy as a SavedModel to clean directories, and avoid saving checkpoints and other artifacts, so that deploying the model to endpoints works.

## Notebook setup

In [1]:
!pwd

/home/jupyter/jt-github/tf_vertex_agents


### set vars

In [2]:
PREFIX = 'mabv1'

In [3]:
# creds, PROJECT_ID = google.auth.default()
GCP_PROJECTS             = !gcloud config get-value project
PROJECT_ID               = GCP_PROJECTS[0]

PROJECT_NUM              = !gcloud projects describe $PROJECT_ID --format="value(projectNumber)"
PROJECT_NUM              = PROJECT_NUM[0]

VERTEX_SA                = f'{PROJECT_NUM}-compute@developer.gserviceaccount.com'

VPC_NETWORK_NAME         = "ucaip-haystack-vpc-network"

# locations / regions for cloud resources
LOCATION                 = 'us-central1'        
REGION                   = LOCATION
BQ_LOCATION              = 'US'

print(f"PROJECT_ID       = {PROJECT_ID}")
print(f"PROJECT_NUM      = {PROJECT_NUM}")
print(f"VPC_NETWORK_NAME = {VPC_NETWORK_NAME}")
print(f"LOCATION         = {LOCATION}")
print(f"REGION           = {REGION}")
print(f"BQ_LOCATION      = {BQ_LOCATION}")

PROJECT_ID       = hybrid-vertex
PROJECT_NUM      = 934903580331
VPC_NETWORK_NAME = ucaip-haystack-vpc-network
LOCATION         = us-central1
REGION           = us-central1
BQ_LOCATION      = US


In [4]:
# GCS bucket and paths
BUCKET_NAME              = f'{PREFIX}-{PROJECT_ID}-bucket'
BUCKET_URI               = f'gs://{BUCKET_NAME}'

# Location of the MovieLens 100K dataset's "u.data" file.
DATA_GCS_PREFIX          = "data"
DATA_PATH                = f"{BUCKET_URI}/{DATA_GCS_PREFIX}"
ARTIFACTS_DIR            = f"{BUCKET_URI}/artifacts"

VPC_NETWORK_FULL         = f"projects/{PROJECT_NUM}/global/networks/{VPC_NETWORK_NAME}"

# BigQuery parameters (used for the Generator, Ingester, Logger)
BIGQUERY_DATASET_ID      = f"{PROJECT_ID}.movielens_dataset_{PREFIX}"
BIGQUERY_TABLE_ID        = f"{BIGQUERY_DATASET_ID}.training_dataset"

print(f"BUCKET_NAME         : {BUCKET_NAME}")
print(f"BUCKET_URI          : {BUCKET_URI}")
print(f"DATA_PATH           : {DATA_PATH}")
print(f"VPC_NETWORK_FULL    : {VPC_NETWORK_FULL}")
print(f"BIGQUERY_DATASET_ID : {BIGQUERY_DATASET_ID}")
print(f"BIGQUERY_TABLE_ID   : {BIGQUERY_TABLE_ID}")

BUCKET_NAME         : mabv1-hybrid-vertex-bucket
BUCKET_URI          : gs://mabv1-hybrid-vertex-bucket
DATA_PATH           : gs://mabv1-hybrid-vertex-bucket/data
VPC_NETWORK_FULL    : projects/934903580331/global/networks/ucaip-haystack-vpc-network
BIGQUERY_DATASET_ID : hybrid-vertex.movielens_dataset_mabv1
BIGQUERY_TABLE_ID   : hybrid-vertex.movielens_dataset_mabv1.training_dataset


### create GCS bucket

In [5]:
# create bucket
# ! gsutil mb -l $REGION $BUCKET_URI

In [6]:
! gsutil ls -al $BUCKET_URI

                                 gs://mabv1-hybrid-vertex-bucket/archived/
                                 gs://mabv1-hybrid-vertex-bucket/data/
                                 gs://mabv1-hybrid-vertex-bucket/data_stats/
                                 gs://mabv1-hybrid-vertex-bucket/perarm-local-test/


### imports

In [7]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

In [8]:
import functools
import json
from collections import defaultdict
from typing import Callable, Dict, List, Optional, TypeVar
from datetime import datetime
import time

import logging
logging.disable(logging.WARNING)

import matplotlib.pyplot as plt
import numpy as np

# google cloud
from google.cloud import aiplatform, storage

# tensorflow
import tensorflow as tf
from tf_agents.agents import TFAgent
from tf_agents.bandits.agents import lin_ucb_agent
from tf_agents.bandits.agents.examples.v2 import trainer
from tf_agents.bandits.environments import (environment_utilities,
                                            movielens_py_environment)
from tf_agents.bandits.metrics import tf_metrics as tf_bandit_metrics
from tf_agents.drivers import dynamic_step_driver
from tf_agents.environments import TFEnvironment, tf_py_environment
from tf_agents.eval import metric_utils
from tf_agents.metrics import tf_metrics
from tf_agents.metrics.tf_metric import TFStepMetric
from tf_agents.policies import policy_saver

# my project
from src.per_arm_rl import data_utils
from src.per_arm_rl import data_config

if tf.__version__[0] != "2":
    raise Exception("The trainer only runs with TensorFlow version 2.")

T = TypeVar("T")

In [9]:
# cloud storage client
storage_client = storage.Client(project=PROJECT_ID)

# Vertex client
aiplatform.init(project=PROJECT_ID, location=LOCATION)

# # bigquery client
# bqclient = bigquery.Client(
#     project=PROJECT_ID,
#     # location=LOCATION
# )

In [10]:
# SAMPLE_DATA_URI = "gs://cloud-samples-data/vertex-ai/community-content/tf_agents_bandits_movie_recommendation_with_kfp_and_vertex_sdk/u.data"

# ! gsutil cp $SAMPLE_DATA_URI $DATA_PATH

In [11]:
! gsutil ls -al $DATA_PATH

  20289631  2023-07-13T14:31:03Z  gs://mabv1-hybrid-vertex-bucket/data/ml-ratings-100k-train.tfrecord#1689258663238090  metageneration=1
TOTAL: 1 objects, 20289631 bytes (19.35 MiB)


## Create training package

In [256]:
REPO_DOCKER_PATH_PREFIX = 'src'
RL_SUB_DIR = 'per_arm_rl'

In [257]:
# Make the training subfolder
# ! rm -rf {REPO_DOCKER_PATH_PREFIX}/{RL_SUB_DIR}
# ! mkdir {REPO_DOCKER_PATH_PREFIX}/{RL_SUB_DIR}
# ! touch {REPO_DOCKER_PATH_PREFIX}/{RL_SUB_DIR}/__init__.py

In [258]:
%%writefile {REPO_DOCKER_PATH_PREFIX}/{RL_SUB_DIR}/policy_util.py
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""The utility module for reinforcement learning policy."""
import collections
from typing import Callable, Dict, List, Optional, TypeVar

from tf_agents.agents import TFAgent
from tf_agents.bandits.agents.examples.v2 import trainer
from tf_agents.drivers import dynamic_step_driver
from tf_agents.environments import TFEnvironment
from tf_agents.eval import metric_utils
from tf_agents.metrics import tf_metrics
from tf_agents.metrics.tf_metric import TFStepMetric
from tf_agents.policies import policy_saver

T = TypeVar("T")

def train(
    agent: TFAgent
    , environment: TFEnvironment
    , training_loops: int
    , steps_per_loop: int
    , additional_metrics: Optional[List[TFStepMetric]] = None
    , training_data_spec_transformation_fn: Optional[Callable[[T],T]] = None
    , run_hyperparameter_tuning: bool = False
    , root_dir: Optional[str] = None
    , artifacts_dir: Optional[str] = None
    , model_dir: Optional[str] = None
) -> Dict[str, List[float]]:
    """
    Performs `training_loops` iterations of training on the agent's policy.

    Uses the `environment` as the problem formulation and source of immediate
    feedback and the agent's algorithm, to perform `training-loops` iterations
    of on-policy training on the policy. Has hyperparameter mode and regular
    training mode.
    If one or more baseline_reward_fns are provided, the regret is computed
    against each one of them. Here is example baseline_reward_fn:
    def baseline_reward_fn(observation, per_action_reward_fns):
     rewards = ... # compute reward for each arm
     optimal_action_reward = ... # take the maximum reward
     return optimal_action_reward

    Args:
      agent: An instance of `TFAgent`.
      environment: An instance of `TFEnvironment`.
      training_loops: An integer indicating how many training loops should be run.
      steps_per_loop: An integer indicating how many driver steps should be
        executed and presented to the trainer during each training loop.
      additional_metrics: Optional; list of metric objects to log, in addition to
        default metrics `NumberOfEpisodes`, `AverageReturnMetric`, and
        `AverageEpisodeLengthMetric`.
      training_data_spec_transformation_fn: Optional; function that transforms
        the data items before they get to the replay buffer.
      run_hyperparameter_tuning: Optional; whether this training logic is
        executed for the purpose of hyperparameter tuning. If so, then it does
        not save model artifacts.
      root_dir: Optional; path to the directory where training artifacts are
        written; usually used for a default or auto-generated location. Do not
        specify this argument if using hyperparameter tuning instead of training.
      artifacts_dir: Optional; path to an extra directory where training
        artifacts are written; usually used for a mutually agreed location from
        which artifacts will be loaded. Do not specify this argument if using
        hyperparameter tuning instead of training.

    Returns:
      A dict mapping metric names (eg. "AverageReturnMetric") to a list of
      intermediate metric values over `training_loops` iterations of training.
    """
    
    # ====================================================
    # get data spec
    # ====================================================
    if run_hyperparameter_tuning and not (root_dir is None and artifacts_dir is None):
        raise ValueError(
            "Do not specify `root_dir` or `artifacts_dir` when" +
            " running hyperparameter tuning."
        )

    if training_data_spec_transformation_fn is None:
        data_spec = agent.policy.trajectory_spec
    else:
        data_spec = training_data_spec_transformation_fn(
            agent.policy.trajectory_spec
        )
        
    # ====================================================
    # define replay buffer
    # ====================================================
    replay_buffer = trainer._get_replay_buffer(
        data_spec = data_spec
        , batch_size = environment.batch_size
        , steps_per_loop = steps_per_loop
        , async_steps_per_loop = 1
    )

    # ====================================================
    # metrics
    # ====================================================
    # `step_metric` records the number of individual rounds of bandit interaction;
    # that is, (number of trajectories) * batch_size.
    
    step_metric = tf_metrics.EnvironmentSteps()
    
    metrics = [
        tf_metrics.NumberOfEpisodes()
        , tf_metrics.AverageEpisodeLengthMetric(batch_size=environment.batch_size)
    ]
    if additional_metrics:
        metrics += additional_metrics

    if isinstance(environment.reward_spec(), dict):
        metrics += [
            tf_metrics.AverageReturnMultiMetric(
                reward_spec=environment.reward_spec()
                , batch_size=environment.batch_size
            )
        ]
    else:
        metrics += [
            tf_metrics.AverageReturnMetric(batch_size=environment.batch_size)
        ]

    # Store intermediate metric results, indexed by metric names.
    metric_results = collections.defaultdict(list)

    # ====================================================
    # Driver
    # ====================================================
    
    if training_data_spec_transformation_fn is not None:
        add_batch_fn = lambda data: replay_buffer.add_batch(
            training_data_spec_transformation_fn(data)
        )
    else:
        add_batch_fn = replay_buffer.add_batch

    observers = [add_batch_fn, step_metric] + metrics

    driver = dynamic_step_driver.DynamicStepDriver(
        env=environment
        , policy=agent.collect_policy
        , num_steps=steps_per_loop * environment.batch_size
        , observers=observers
    )

    # ====================================================
    # training_loop
    # ====================================================
    training_loop = trainer._get_training_loop(
        driver = driver
        , replay_buffer = replay_buffer
        , agent = agent
        , steps = steps_per_loop
        , async_steps_per_loop = 1
    )
    if not run_hyperparameter_tuning:
        saver = policy_saver.PolicySaver(agent.policy)

    for train_step in range(training_loops):
        training_loop(
            train_step = train_step
            , metrics = metrics
        )
        metric_utils.log_metrics(metrics)
    
        for metric in metrics:
            metric.tf_summaries(train_step = step_metric.result())
            metric_results[type(metric).__name__].append(metric.result().numpy())
    
    if not run_hyperparameter_tuning:
        saver.save(model_dir)
        saver.save(artifacts_dir)
    
    return metric_results

Overwriting src/per_arm_rl/policy_util.py


### train task

**TODO:**
* add vertex experiments to train task - following logic to not log experiment if HPT = True

In [259]:
%%writefile {REPO_DOCKER_PATH_PREFIX}/{RL_SUB_DIR}/task.py
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""The entrypoint for training a policy."""
import argparse
import functools
import json
import logging
import os
import sys
from typing import List, Union

# google cloud
from google.cloud import aiplatform, storage
import hypertune

from . import policy_util
from . import data_utils
from . import train_utils
from . import data_config
from . import my_per_arm_py_env

import tensorflow as tf
from tensorflow.python.client import device_lib
from tf_agents.bandits.agents import lin_ucb_agent
from tf_agents.bandits.environments import environment_utilities
from tf_agents.bandits.environments import movielens_py_environment
from tf_agents.bandits.metrics import tf_metrics as tf_bandit_metrics
from tf_agents.environments import tf_py_environment

if tf.__version__[0] != "2":
    raise Exception("The trainer only runs with TensorFlow version 2.")

PER_ARM = True  # Use the non-per-arm version of the MovieLens environment.

def get_args(
    raw_args: List[str]
) -> argparse.Namespace:
    """Parses parameters and hyperparameters for training a policy.

    Args:
      raw_args: A list of command line arguments.

    Returns:
      An argpase.Namespace object mapping (hyper)parameter names to the parsed
      values.
    """
    parser = argparse.ArgumentParser()
    
    parser.add_argument(
        "--project_id"
        , type=str
        , default='hybrid-vertex'
    )
    # Whether to execute hyperparameter tuning or training
    parser.add_argument(
        "--run-hyperparameter-tuning"
        , action="store_true"
        , help="Whether to perform hyperparameter tuning instead of regular training."
    )
    # Whether to train using the best hyperparameters learned from a previous
    # hyperparameter tuning job.
    parser.add_argument(
        "--train-with-best-hyperparameters"
        , action="store_true"
        , help="Whether to train using the best hyperparameters learned from a previous hyperparameter tuning job."
    )
    # Path parameters
    parser.add_argument(
        "--artifacts-dir"
        , type=str
        , help="Extra directory where model artifacts are saved."
    )
    parser.add_argument(
        "--profiler-dir"
        , default=None
        , type=str
        , help="Directory for TensorBoard Profiler artifacts."
    )
    parser.add_argument(
        "--data-path", type=str, help="Path to MovieLens 100K's 'u.data' file."
    )
    parser.add_argument(
        "--best-hyperparameters-bucket"
        , type=str
        , help="Path to MovieLens 100K's 'u.data' file."
    )
    parser.add_argument(
        "--best-hyperparameters-path"
        , type=str
        , help="Path to JSON file containing the best hyperparameters."
    )
    # Hyperparameters
    parser.add_argument(
        "--batch-size"
        , default=8
        , type=int
        , help="Training and prediction batch size."
    )
    parser.add_argument(
        "--training-loops"
        , default=4
        , type=int
        , help="Number of training iterations."
    )
    parser.add_argument(
        "--steps-per-loop"
        , default=2
        , type=int
        , help="Number of driver steps per training iteration."
    )
    # MovieLens simulation environment parameters
    parser.add_argument(
        "--rank-k"
        , default=20
        , type=int
        , help="Rank for matrix factorization in the MovieLens environment; also the observation dimension."
    )
    parser.add_argument(
        "--num-actions"
        , default=20
        , type=int
        , help="Number of actions (movie items) to choose from."
    )
    # LinUCB agent parameters
    parser.add_argument(
        "--tikhonov-weight"
        , default=0.001
        , type=float
        , help="LinUCB Tikhonov regularization weight."
    )
    parser.add_argument(
        "--agent-alpha"
        , default=10.0
        , type=float
        , help="LinUCB exploration parameter that multiplies the confidence intervals."
    )

    ### new
    parser.add_argument(
        "--bucket_name"
        , default="tmp"
        , type=str
        , help=" "
    )

    parser.add_argument(
        "--data_gcs_prefix"
        , default="data"
        , type=str
        , help=""
    )
    
    parser.add_argument(
        "--data_path"
        , default="gs://tmp/tmp"
        , type=str
        , help=""
    )
    
    parser.add_argument(
        "--project_number"
        , default="934903580331"
        , type=str
        , help=""
    )
    # distribute
    parser.add_argument(
        "--distribute"
        , default="single"
        , type=str
        , help=""
    )
    # artifacts_dir
    parser.add_argument(
        "--artifacts_dir"
        , default="gs://BUCKET/EXPERIMENT/RUN_NAME/artifacts"
        , type=str
        , help=""
    )
    parser.add_argument(
        "--root_dir"
        , default="gs://BUCKET/EXPERIMENT/RUN_NAME/root"
        , type=str
        , help=""
    )
    
    return parser.parse_args(raw_args)

def execute_task(
    args: argparse.Namespace
    , best_hyperparameters_blob: Union[storage.Blob, None]
    , hypertune_client: Union[hypertune.HyperTune, None]
) -> None:
    """Executes training, or hyperparameter tuning, for the policy.

    Parses parameters and hyperparameters from the command line, reads best
    hyperparameters if applicable, constructs the logical modules for RL, and
    executes training or hyperparameter tuning. Tracks the training process
    and resources using TensorBoard Profiler if applicable.

    Args:
      args: An argpase.Namespace object of (hyper)parameter values.
      best_hyperparameters_blob: An object containing best hyperparameters in
        Google Cloud Storage.
      hypertune_client: Client for submitting hyperparameter tuning metrics.
    """
    
    # [Do Not Change] Set the root directory for training artifacts.
    # TODO - JT
    MODEL_DIR = os.environ["AIP_MODEL_DIR"] if not args.run_hyperparameter_tuning else ""
    root_dir = args.root_dir if not args.run_hyperparameter_tuning else ""
    logging.info(f'root_dir: {root_dir}')

    # Use best hyperparameters learned from a previous hyperparameter tuning job.
    logging.info(args.train_with_best_hyperparameters)
    if args.train_with_best_hyperparameters:
        logging.info(f'train_with_best_hyperparameters engaged...')
        best_hyperparameters = json.loads(
            best_hyperparameters_blob.download_as_string()
        )
        
        if "batch-size" in best_hyperparameters:
            args.batch_size = int(best_hyperparameters["batch-size"])
        if "training-loops" in best_hyperparameters:
            args.training_loops = int(best_hyperparameters["training-loops"])
        if "steps-per-loop" in best_hyperparameters:
            args.step_per_loop = int(best_hyperparameters["steps-per-loop"])

    # Define RL environment.
    env = my_per_arm_py_env.MyMovieLensPerArmPyEnvironment(
        project_number = args.project_number
        , data_path = args.data_path
        , bucket_name = args.bucket_name
        , data_gcs_prefix = args.data_gcs_prefix
        , user_age_lookup_dict = data_config.USER_AGE_LOOKUP
        , user_occ_lookup_dict = data_config.USER_OCC_LOOKUP
        , movie_gen_lookup_dict = data_config.MOVIE_GEN_LOOKUP
        , num_users = data_config.MOVIELENS_NUM_USERS
        , num_movies = data_config.MOVIELENS_NUM_MOVIES
        , rank_k = args.rank_k
        , batch_size = args.batch_size
        , num_actions = args.num_actions
    )
    environment = tf_py_environment.TFPyEnvironment(env)
    
    strategy = train_utils.get_train_strategy(distribute_arg=args.distribute)
    logging.info(f'TF training strategy (execute task) = {strategy}')
    
    with strategy.scope():
        # Define RL agent/algorithm.
        agent = lin_ucb_agent.LinearUCBAgent(
            time_step_spec=environment.time_step_spec()
            , action_spec=environment.action_spec()
            , tikhonov_weight=args.tikhonov_weight
            , alpha=args.agent_alpha
            , dtype=tf.float32
            , accepts_per_arm_features=PER_ARM # TODO - streamline
        )
    logging.info("TimeStep Spec (for each batch):\n%s\n", agent.time_step_spec)
    logging.info("Action Spec (for each batch):\n%s\n", agent.action_spec)
    logging.info("Reward Spec (for each batch):\n%s\n", environment.reward_spec())

    # Define RL metric.
    optimal_reward_fn = functools.partial(
        environment_utilities.compute_optimal_reward_with_movielens_environment
        , environment=environment
    )
    
    regret_metric = tf_bandit_metrics.RegretMetric(optimal_reward_fn)
    metrics = [regret_metric]

    # Perform on-policy training with the simulation MovieLens environment.
    if args.profiler_dir is not None:
        tf.profiler.experimental.start(args.profiler_dir)
  
    metric_results = policy_util.train(
        agent=agent
        , environment=environment
        , training_loops=args.training_loops
        , steps_per_loop=args.steps_per_loop
        , additional_metrics=metrics
        , run_hyperparameter_tuning=args.run_hyperparameter_tuning
        , root_dir=root_dir if not args.run_hyperparameter_tuning else None
        , artifacts_dir=args.artifacts_dir
        if not args.run_hyperparameter_tuning else None
        , model_dir = MODEL_DIR
    )
    
    if args.profiler_dir is not None:
        tf.profiler.experimental.stop()

    # Report training metrics to Vertex AI for hyperparameter tuning
    if args.run_hyperparameter_tuning:
        hypertune_client.report_hyperparameter_tuning_metric(
            hyperparameter_metric_tag="final_average_return"
            , metric_value=metric_results["AverageReturnMetric"][-1]
            # , global_step=args.training_loops
        )

def main() -> None:
    """
    Entry point for training or hyperparameter tuning.
    """
    args = get_args(sys.argv[1:])
    # =============================================
    # set GCP clients
    # =============================================
    from google.cloud import aiplatform as vertex_ai
    from google.cloud import storage

    project_number = os.environ["CLOUD_ML_PROJECT_ID"]
    storage_client = storage.Client(project=project_number)
    
    vertex_ai.init(
        project=project_number,
        location='us-central1',
        # experiment=args.experiment_name
    )
    
    # =============================================
    # GPUs
    # =============================================

    # limiting GPU growth
    gpus = tf.config.list_physical_devices('GPU')
    if gpus:
        try:
            for gpu in gpus:
                tf.config.experimental.set_memory_growth(gpu, True)
            logging.info(f'detected: {len(gpus)} GPUs')
        except RuntimeError as e:
            # Memory growth must be set before GPUs have been initialized
            logging.info(e)

    # tf.debugging.set_log_device_placement(True)          # logs all tf ops and their device placement;
    os.environ['TF_GPU_THREAD_MODE'] = 'gpu_private'
    os.environ['TF_GPU_THREAD_COUNT'] = f'8'               # TODO - parametrize | 1
    os.environ['TF_GPU_ALLOCATOR'] = 'cuda_malloc_async'
    
    # ====================================================
    # Set Device Strategy
    # ====================================================
    logging.info("Detecting devices....")
    logging.info('DEVICES'  + str(device_lib.list_local_devices()))
    
    logging.info("Setting device strategy...")
    
    strategy = train_utils.get_train_strategy(distribute_arg=args.distribute)
    logging.info(f'TF training strategy (main) = {strategy}')
    
    NUM_REPLICAS = strategy.num_replicas_in_sync
    logging.info(f'num_replicas_in_sync = {NUM_REPLICAS}')
    
    # Here the batch size scales up by number of workers since
    # `tf.data.Dataset.batch` expects the global batch size.
    GLOBAL_BATCH_SIZE = int(args.batch_size) * int(NUM_REPLICAS)
    logging.info(f'GLOBAL_BATCH_SIZE = {GLOBAL_BATCH_SIZE}')

    # type and task of machine from strategy
    logging.info(f'Setting task_type and task_id...')
    if args.distribute == 'multiworker':
        task_type, task_id = (
            strategy.cluster_resolver.task_type,
            strategy.cluster_resolver.task_id
        )
    else:
        task_type, task_id = 'chief', None
    
    logging.info(f'task_type = {task_type}')
    logging.info(f'task_id = {task_id}')
    
    # ====================================================
    # determine train job type and execute
    # ====================================================
    
    if args.train_with_best_hyperparameters:
        storage_client = storage.Client(args.project_id)
        bucket = storage_client.bucket(args.bucket_name)
        best_hyperparameters_blob = bucket.blob(args.best_hyperparameters_path)
    
    else:
        best_hyperparameters_blob = None
    
    hypertune_client = hypertune.HyperTune() if args.run_hyperparameter_tuning else None

    execute_task(
        args = args
        , best_hyperparameters_blob = best_hyperparameters_blob
        , hypertune_client = hypertune_client
    )

if __name__ == "__main__":
    
    logging.getLogger().setLevel(logging.INFO)
    logging.info("Python Version = %s", sys.version)
    logging.info("TensorFlow Version = %s", tf.__version__)
    # logging.info("TF_CONFIG = %s", os.environ.get("TF_CONFIG", "Not found"))
    # logging.info("DEVICES = %s", device_lib.list_local_devices())
    logging.info("Reinforcement learning task started...")
    
    main()
    
    logging.info("Reinforcement learning task completed.")

Overwriting src/per_arm_rl/task.py


## Build train application

### Vertex Experiments

In [260]:
EXPERIMENT_NAME   = f'scale-perarm-hpt'

invoke_time       = time.strftime("%Y%m%d-%H%M%S")
RUN_NAME          = f'run-{invoke_time}'

LOG_DIR           = f"{BUCKET_URI}/{EXPERIMENT_NAME}/{RUN_NAME}/tb-logs"
ROOT_DIR          = f"{BUCKET_URI}/{EXPERIMENT_NAME}/{RUN_NAME}/root"       # Root directory for writing logs/summaries/checkpoints.
ARTIFACTS_DIR     = f"{BUCKET_URI}/{EXPERIMENT_NAME}/{RUN_NAME}/artifacts"  # Where the trained model will be saved and restored.

print(f"EXPERIMENT_NAME   : {EXPERIMENT_NAME}")
print(f"RUN_NAME          : {RUN_NAME}")
print(f"LOG_DIR           : {LOG_DIR}")
print(f"ROOT_DIR          : {ROOT_DIR}")
print(f"ARTIFACTS_DIR     : {ARTIFACTS_DIR}")

EXPERIMENT_NAME   : scale-perarm-hpt
RUN_NAME          : run-20230714-134124
LOG_DIR           : gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134124/tb-logs
ROOT_DIR          : gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134124/root
ARTIFACTS_DIR     : gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134124/artifacts


### Create a Cloud Build YAML file

In [262]:
%%writefile cloudbuild.yaml

steps:
- name: 'gcr.io/cloud-builders/docker'
  args: ['build', '-t', '$_IMAGE_URI', '$_FILE_LOCATION', '-f', '$_FILE_LOCATION/Dockerfile_$_DOCKERNAME']
  env: ['AIP_STORAGE_URI=$_ARTIFACTS_DIR']
images:
- '$_IMAGE_URI'

Overwriting cloudbuild.yaml


### Write a Dockerfile
* Use the [cloudml-hypertune](https://github.com/GoogleCloudPlatform/cloudml-hypertune) Python package to report training metrics to Vertex AI for hyperparameter tuning
* Use the Google [Cloud Storage client library](https://cloud.google.com/storage/docs/reference/libraries) to read the best hyperparameters learned from a previous hyperarameter tuning job during training

In [263]:
DOCKERNAME = 'train_perarm'
# ! rm -rf Dockerfile_{DOCKERNAME}

In [264]:
%%writefile Dockerfile_{DOCKERNAME}

# Specifies base image and tag.
# FROM gcr.io/google-appengine/python
FROM python:3.10
ENV PYTHONUNBUFFERED True

WORKDIR /root

# Installs additional packages.
RUN pip3 install cloudml-hypertune
RUN pip3 install google-cloud-storage
RUN pip3 install google-cloud-aiplatform
RUN pip3 install tensorflow==2.12.0
RUN pip3 install tensorboard
RUN pip3 install tensorboard-plugin-profile
RUN pip3 install tensorboard-plugin-wit
RUN pip3 install tensorboard-data-server
RUN pip3 install tensorflow-io
RUN pip3 install tf-agents==0.17.0
RUN pip3 install matplotlib
RUN pip3 install urllib3

# Copies training code to the Docker image.
COPY src/per_arm_rl /root/src/per_arm_rl

# Sets up the entry point to invoke the task.
ENTRYPOINT ["python3", "-m", "src.per_arm_rl.task"]

Overwriting Dockerfile_train_perarm


#### Build the custom container with Cloud Build

In [265]:
HPTUNING_TRAINING_CONTAINER = "hptuning-training-custom-container"

# Docker definitions for training
IMAGE_URI = f'gcr.io/{PROJECT_ID}/{HPTUNING_TRAINING_CONTAINER}'
MACHINE_TYPE ='e2-highcpu-32'
FILE_LOCATION = './'

print(f"export DOCKERNAME    = {DOCKERNAME}")
print(f"export IMAGE_URI     = {IMAGE_URI}")
print(f"export FILE_LOCATION = {FILE_LOCATION}")
print(f"export MACHINE_TYPE  = {MACHINE_TYPE}")
print(f"export ARTIFACTS_DIR = {ARTIFACTS_DIR}")

export DOCKERNAME    = train_perarm
export IMAGE_URI     = gcr.io/hybrid-vertex/hptuning-training-custom-container
export FILE_LOCATION = ./
export MACHINE_TYPE  = e2-highcpu-32
export ARTIFACTS_DIR = gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134124/artifacts


In [266]:
! gcloud builds submit --config cloudbuild.yaml \
    --substitutions _DOCKERNAME=$DOCKERNAME,_IMAGE_URI=$IMAGE_URI,_FILE_LOCATION=$FILE_LOCATION,_ARTIFACTS_DIR=$ARTIFACTS_DIR \
    --timeout=2h \
    --machine-type=$MACHINE_TYPE

Creating temporary tarball archive of 58 file(s) totalling 57.2 MiB before compression.
Some files were not included in the source upload.

Check the gcloud log [/home/jupyter/.config/gcloud/logs/2023.07.14/13.41.31.114949.log] to see which files and the contents of the
default gcloudignore file used (see `$ gcloud topic gcloudignore` to learn
more).

Uploading tarball of [.] to [gs://hybrid-vertex_cloudbuild/source/1689342091.211716-cc5d9e55d5e84ea380ab6431bab7c0b6.tgz]
Created [https://cloudbuild.googleapis.com/v1/projects/hybrid-vertex/locations/global/builds/938c0a4b-1db6-47aa-a5b6-ae935f9c9d7d].
Logs are available at [ https://console.cloud.google.com/cloud-build/builds/938c0a4b-1db6-47aa-a5b6-ae935f9c9d7d?project=934903580331 ].
----------------------------- REMOTE BUILD OUTPUT ------------------------------
starting build "938c0a4b-1db6-47aa-a5b6-ae935f9c9d7d"

FETCHSOURCE
Fetching storage object: gs://hybrid-vertex_cloudbuild/source/1689342091.211716-cc5d9e55d5e84ea380ab6431bab

## Submit (hpt) tuning job
* Submit a hyperparameter training job with the custom container. Read more details for using Python packages as an alternative to using custom containers in the example shown [here](https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#create)
* Define the hyperparameter(s), max trial count, parallel trial count, parameter search algorithm, machine spec, accelerators, worker pool, etc.

In [220]:
# Execute hyperparameter tuning instead of regular training
RUN_HYPERPARAMETER_TUNING          = True
TRAIN_WITH_BEST_HYPERPARAMETERS    = False  # Do not train.

# Directory to store the best hyperparameter(s) in `BUCKET_NAME` and locally (temporarily)
HPTUNING_RESULT_DIR                = "hptuning"
HPTUNING_RESULT_FILE               = "result.json"
HPTUNING_RESULT_PATH               = f"{EXPERIMENT_NAME}/{RUN_NAME}/{HPTUNING_RESULT_DIR}/{HPTUNING_RESULT_FILE}"
HPTUNING_RESULT_URI                = f"{BUCKET_URI}/{HPTUNING_RESULT_PATH}"

# HPTUNING_RESULT_PATH               = os.path.join(HPTUNING_RESULT_DIR, "result.json")

print(f"HPTUNING_RESULT_DIR  : {HPTUNING_RESULT_DIR}")
print(f"HPTUNING_RESULT_FILE : {HPTUNING_RESULT_FILE}")
print(f"HPTUNING_RESULT_PATH : {HPTUNING_RESULT_PATH}")
print(f"HPTUNING_RESULT_URI  : {HPTUNING_RESULT_URI}")

HPTUNING_RESULT_DIR  : hptuning
HPTUNING_RESULT_FILE : result.json
HPTUNING_RESULT_PATH : scale-perarm-hpt/run-20230714-130012/hptuning/result.json
HPTUNING_RESULT_URI : gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-130012/hptuning/result.json


### Accelerators

In [221]:
# WORKER_MACHINE_TYPE = 'a2-highgpu-1g'
# REPLICA_COUNT = 1
# ACCELERATOR_TYPE = 'NVIDIA_TESLA_A100'
# PER_MACHINE_ACCELERATOR_COUNT = 1
# # REDUCTION_SERVER_COUNT = 0                                                      
# # REDUCTION_SERVER_MACHINE_TYPE = "n1-highcpu-16"
# DISTRIBUTE_STRATEGY = 'single'

WORKER_MACHINE_TYPE = 'n1-standard-16'
REPLICA_COUNT = 1
ACCELERATOR_TYPE = 'NVIDIA_TESLA_T4' # NVIDIA_TESLA_T4 NVIDIA_TESLA_V100
PER_MACHINE_ACCELERATOR_COUNT = 1
DISTRIBUTE_STRATEGY = 'single'

print(f"WORKER_MACHINE_TYPE           : {WORKER_MACHINE_TYPE}")
print(f"REPLICA_COUNT                 : {REPLICA_COUNT}")
print(f"ACCELERATOR_TYPE              : {ACCELERATOR_TYPE}")
print(f"PER_MACHINE_ACCELERATOR_COUNT : {PER_MACHINE_ACCELERATOR_COUNT}")
print(f"DISTRIBUTE_STRATEGY           : {DISTRIBUTE_STRATEGY}")

WORKER_MACHINE_TYPE           : n1-standard-16
REPLICA_COUNT                 : 1
ACCELERATOR_TYPE              : NVIDIA_TESLA_T4
PER_MACHINE_ACCELERATOR_COUNT : 1
DISTRIBUTE_STRATEGY           : single


### TODO - Tensorboard

### init Vertex SDK

In [222]:
aiplatform.init(
    project=PROJECT_ID
    , location=REGION
    , staging_bucket=BUCKET_NAME
)

### helper function: create training job

In [223]:
def create_hyperparameter_tuning_job_sample(
    project: str
    , display_name: str
    , image_uri: str
    , args: List[str]
    , max_trial_count: int
    , parallel_trial_count: int
    , location: str = "us-central1"
    , api_endpoint: str = "us-central1-aiplatform.googleapis.com"
) -> None:
    """
    Creates a hyperparameter tuning job using a custom container.

    Args:
        project: GCP project ID.
        display_name: GCP console display name for the hyperparameter tuning job in
            Vertex AI.
        image_uri: URI to the hyperparameter tuning container image in Container
            Registry.
        args: Arguments passed to the container.
        location: Service location.
        api_endpoint: API endpoint, eg. `<location>-aiplatform.googleapis.com`.

    Returns:
        A string of the hyperparameter tuning job ID.
    """
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aiplatform.gapic.JobServiceClient(client_options=client_options)

    # ====================================================
    # study_spec
    # ====================================================
    # Metric based on which to evaluate which combination of hyperparameter(s) to choose
    metric = {
        "metric_id": "final_average_return"  # Metric you report to Vertex AI.
        , "goal": aiplatform.gapic.StudySpec.MetricSpec.GoalType.MAXIMIZE,
    }

    # ====================================================
    # Hyperparameter(s) to tune
    # ====================================================
    # training_loops = {
    #     "parameter_id": "training-loops"
    #     , "discrete_value_spec": {"values": [4, 16]}
    #     , "scale_type": aiplatform.gapic.StudySpec.ParameterSpec.ScaleType.UNIT_LINEAR_SCALE
    # }
    steps_per_loop = {
        "parameter_id": "steps-per-loop"
        , "discrete_value_spec": {"values": [2, 4]}
        , "scale_type": aiplatform.gapic.StudySpec.ParameterSpec.ScaleType.UNIT_LINEAR_SCALE
    }
    batch_size_hpt = {
        "parameter_id": "batch-size"
        , "discrete_value_spec": {"values": [8, 16]}
        , "scale_type": aiplatform.gapic.StudySpec.ParameterSpec.ScaleType.UNIT_LINEAR_SCALE
    }
    # num_actions_hpt = {
    #     "parameter_id": "num-actions"
    #     , "discrete_value_spec": {"values": [5, 10, 25, 35]}
    #     , "scale_type": aiplatform.gapic.StudySpec.ParameterSpec.ScaleType.UNIT_LINEAR_SCALE
    # }

    # ====================================================
    # worker_pool_spec
    # ====================================================
    # machine_spec = {
    #     "machine_type": "n1-standard-32"
    #     , "accelerator_type": aiplatform.gapic.AcceleratorType.ACCELERATOR_TYPE_UNSPECIFIED
    #     , "accelerator_count": None
    # }
    machine_spec = {
        "machine_type": WORKER_MACHINE_TYPE    # "n1-standard-16"
        , "accelerator_type": ACCELERATOR_TYPE # aiplatform.gapic.AcceleratorType.NVIDIA_TESLA_T4
        , "accelerator_count": PER_MACHINE_ACCELERATOR_COUNT
    }
    worker_pool_spec = {
        "machine_spec": machine_spec
        , "replica_count": REPLICA_COUNT
        , "container_spec": {
            "image_uri": image_uri
            , "args": args
        },
    }

    # ====================================================
    # hyperparameter_tuning_job
    # ====================================================
    hyperparameter_tuning_job = {
        "display_name": display_name
        , "max_trial_count": max_trial_count
        , "parallel_trial_count": parallel_trial_count
        , "study_spec": {
            "metrics": [metric]
            # , "parameters": [training_loops, steps_per_loop]
            , "parameters": [batch_size_hpt, steps_per_loop] # num_actions_hpt
            , "algorithm": aiplatform.gapic.StudySpec.Algorithm.RANDOM_SEARCH
        }
        , "trial_job_spec": {"worker_pool_specs": [worker_pool_spec]}
    }
    parent = f"projects/{project}/locations/{location}"

    # ====================================================
    # Create job via client
    # ====================================================
    response = client.create_hyperparameter_tuning_job(
        parent=parent
        , hyperparameter_tuning_job=hyperparameter_tuning_job
    )
    job_id = response.name.split("/")[-1]
    print("Job ID:", job_id)
    print("Job config:", response)
    
#     # ====================================================
#     # Create job via SDK
#     # ====================================================
#     metric_spec = {"final_average_return": "maximize"}
    
#     parameter_spec = {
#         "training-loops": hpt.DiscreteParameterSpec(values=[4, 16], scale="linear")
#         , "steps-per-loop": hpt.DiscreteParameterSpec(values=[1, 2], scale="linear")
#     }
#     my_custom_job = aiplatform.CustomJob(
#         display_name=display_name
#         , worker_pool_specs=worker_pool_spec
#         , staging_bucket=ROOT_DIR
#     )
    
#     hp_job = aiplatform.HyperparameterTuningJob(
#         display_name=display_name
#         , custom_job=my_custom_job
#         , metric_spec=metric_spec
#         , parameter_spec=parameter_spec
#         , max_trial_count=hyperparameter_tuning_job["max_trial_count"]
#         , parallel_trial_count=hyperparameter_tuning_job["parallel_trial_count"]
#     )

#     hp_job.run(sync=False)

    return job_id

### set training args

In [224]:
# Set hyperparameters.
BATCH_SIZE       = 8       # Training and prediction batch size.
TRAINING_LOOPS   = 20      # Number of training iterations.
STEPS_PER_LOOP   = 6       # Number of driver steps per training iteration.

# Set MovieLens simulation environment parameters.
RANK_K           = 20      # Rank for matrix factorization in the MovieLens environment; also the observation dimension.
NUM_ACTIONS      = 20      # Number of actions (movie items) to choose from.
PER_ARM          = False   # Use the non-per-arm version of the MovieLens environment.

# Set agent parameters.
TIKHONOV_WEIGHT  = 0.001   # LinUCB Tikhonov regularization weight.
AGENT_ALPHA      = 10.0    # LinUCB exploration parameter that multiplies the confidence intervals.

print(f"BATCH_SIZE       : {BATCH_SIZE}")
print(f"TRAINING_LOOPS   : {TRAINING_LOOPS}")
print(f"STEPS_PER_LOOP   : {STEPS_PER_LOOP}")
print(f"RANK_K           : {RANK_K}")
print(f"NUM_ACTIONS      : {NUM_ACTIONS}")
print(f"PER_ARM          : {PER_ARM}")
print(f"TIKHONOV_WEIGHT  : {TIKHONOV_WEIGHT}")
print(f"AGENT_ALPHA      : {AGENT_ALPHA}")

BATCH_SIZE       : 8
TRAINING_LOOPS   : 20
STEPS_PER_LOOP   : 6
RANK_K           : 20
NUM_ACTIONS      : 20
PER_ARM          : False
TIKHONOV_WEIGHT  : 0.001
AGENT_ALPHA      : 10.0


In [225]:
args = [
    f"--data-path={DATA_PATH}"               # TODO - remove duplicate arg
    , f"--bucket_name={BUCKET_NAME}"
    , f"--data_gcs_prefix={DATA_GCS_PREFIX}"
    , f"--data_path={DATA_PATH}"
    , f"--project_number={PROJECT_NUM}"
    , f"--batch-size={BATCH_SIZE}"
    , f"--rank-k={RANK_K}"
    , f"--num-actions={NUM_ACTIONS}"
    , f"--tikhonov-weight={TIKHONOV_WEIGHT}"
    , f"--agent-alpha={AGENT_ALPHA}"
    , f"--training-loops={TRAINING_LOOPS}"
    , f"--steps-per-loop={STEPS_PER_LOOP}"
    , f"--distribute={DISTRIBUTE_STRATEGY}"
    , f"--artifacts_dir={ARTIFACTS_DIR}"
    , f"--root_dir={ROOT_DIR}"
]

if RUN_HYPERPARAMETER_TUNING:
    args.append("--run-hyperparameter-tuning")
    
elif TRAIN_WITH_BEST_HYPERPARAMETERS:
    args.append("--train-with-best-hyperparameters")
    
from pprint import pprint
pprint(f"args: {args}")

("args: ['--data-path=gs://mabv1-hybrid-vertex-bucket/data', "
 "'--bucket_name=mabv1-hybrid-vertex-bucket', '--data_gcs_prefix=data', "
 "'--data_path=gs://mabv1-hybrid-vertex-bucket/data', "
 "'--project_number=934903580331', '--batch-size=8', '--rank-k=20', "
 "'--num-actions=20', '--tikhonov-weight=0.001', '--agent-alpha=10.0', "
 "'--training-loops=20', '--steps-per-loop=6', '--distribute=single', "
 "'--artifacts_dir=gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-130012/artifacts', "
 "'--run-hyperparameter-tuning']")


In [226]:
job_id = create_hyperparameter_tuning_job_sample(
    project=PROJECT_ID
    , display_name=f"mvl-hpt-job-{RUN_NAME}"
    , image_uri=f"gcr.io/{PROJECT_ID}/{HPTUNING_TRAINING_CONTAINER}:latest"
    , args=args
    , max_trial_count = 4
    , parallel_trial_count = 2
    , location=REGION
    , api_endpoint=f"{REGION}-aiplatform.googleapis.com"
)

job_id

Job ID: 9007622635437162496
Job config: name: "projects/934903580331/locations/us-central1/hyperparameterTuningJobs/9007622635437162496"
display_name: "mvl-hpt-job-run-20230714-130012"
study_spec {
  metrics {
    metric_id: "final_average_return"
    goal: MAXIMIZE
  }
  parameters {
    parameter_id: "batch-size"
    discrete_value_spec {
      values: 8.0
      values: 16.0
    }
    scale_type: UNIT_LINEAR_SCALE
  }
  parameters {
    parameter_id: "steps-per-loop"
    discrete_value_spec {
      values: 2.0
      values: 4.0
    }
    scale_type: UNIT_LINEAR_SCALE
  }
  algorithm: RANDOM_SEARCH
}
max_trial_count: 4
parallel_trial_count: 2
trial_job_spec {
  worker_pool_specs {
    machine_spec {
      machine_type: "n1-standard-16"
      accelerator_type: NVIDIA_TESLA_T4
      accelerator_count: 1
    }
    replica_count: 1
    disk_spec {
      boot_disk_type: "pd-ssd"
      boot_disk_size_gb: 100
    }
    container_spec {
      image_uri: "gcr.io/hybrid-vertex/hptuning-training

'9007622635437162496'

#### Check hyperparameter tuning job status
* Read more about managing jobs [here](https://cloud.google.com/vertex-ai/docs/training/using-hyperparameter-tuning#manage)

In [227]:
def get_hyperparameter_tuning_job_sample(
    project: str,
    hyperparameter_tuning_job_id: str,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
) -> aiplatform.HyperparameterTuningJob:
    """
    Gets the current status of a hyperparameter tuning job.

    Args:
        project: GCP project ID.
        hyperparameter_tuning_job_id: Hyperparameter tuning job ID.
        location: Service location.
        api_endpoint: API endpoint, eg. `-aiplatform.googleapis.com`.

    Returns:
        Details of the hyperparameter tuning job, such as its running status,
        results of its trials, etc.
    """
    # The AI Platform services require regional API endpoints.
    client_options = {"api_endpoint": api_endpoint}
    
    # Initialize client that will be used to create and send requests.
    # This client only needs to be created once, and can be reused for multiple requests.
    client = aiplatform.gapic.JobServiceClient(client_options=client_options)
    
    name = client.hyperparameter_tuning_job_path(
        project=project
        , location=location
        , hyperparameter_tuning_job=hyperparameter_tuning_job_id
    )
    
    response = client.get_hyperparameter_tuning_job(name=name)
    
    return response

In [228]:
trials = None

while True:
    response = get_hyperparameter_tuning_job_sample(
        project=PROJECT_ID
        , hyperparameter_tuning_job_id=job_id
        , location=REGION
        , api_endpoint=f"{REGION}-aiplatform.googleapis.com"
    )
    
    if response.state.name == 'JOB_STATE_SUCCEEDED':
        print("Job succeeded.\nJob Time:", response.update_time - response.create_time)
        trials = response.trials
        print("Trials:", trials)
        break
    elif response.state.name == "JOB_STATE_FAILED":
        print("Job failed.")
        break
    elif response.state.name == "JOB_STATE_CANCELLED":
        print("Job cancelled.")
        break
    else:
        print(f"Current job status: {response.state.name}.")
    time.sleep(60)

Job succeeded.
Job Time: 0:16:36.929684
Trials: [id: "1"
state: SUCCEEDED
parameters {
  parameter_id: "batch-size"
  value {
    number_value: 8.0
  }
}
parameters {
  parameter_id: "steps-per-loop"
  value {
    number_value: 4.0
  }
}
final_measurement {
  step_count: 1
  metrics {
    metric_id: "final_average_return"
    value: 1.2964732646942139
  }
}
start_time {
  seconds: 1689340036
  nanos: 175740133
}
end_time {
  seconds: 1689340194
}
, id: "2"
state: SUCCEEDED
parameters {
  parameter_id: "batch-size"
  value {
    number_value: 8.0
  }
}
parameters {
  parameter_id: "steps-per-loop"
  value {
    number_value: 4.0
  }
}
final_measurement {
  step_count: 1
  metrics {
    metric_id: "final_average_return"
    value: 1.2616862058639526
  }
}
start_time {
  seconds: 1689340036
  nanos: 175927645
}
end_time {
  seconds: 1689340195
}
, id: "3"
state: SUCCEEDED
parameters {
  parameter_id: "batch-size"
  value {
    number_value: 16.0
  }
}
parameters {
  parameter_id: "steps-p

#### Find the best combination(s) hyperparameter(s) for each metric

In [229]:
if trials:
    # Dict mapping from metric names to the best metric values seen so far
    best_objective_values = dict.fromkeys(
        [metric.metric_id for metric in trials[0].final_measurement.metrics]
        , -np.inf
    )
    # Dict mapping from metric names to a list of the best combination(s) of
    # hyperparameter(s). Each combination is a dict mapping from hyperparameter
    # names to their values.
    best_params = defaultdict(list)
    for trial in trials:
        # `final_measurement` and `parameters` are `RepeatedComposite` objects.
        # Reference the structure above to extract the value of your interest.
        for metric in trial.final_measurement.metrics:
            params = {
                param.parameter_id: param.value for param in trial.parameters
            }
            if metric.value > best_objective_values[metric.metric_id]:
                best_params[metric.metric_id] = [params]
            elif metric.value == best_objective_values[metric.metric_id]:
                best_params[param.parameter_id].append(params)  # Handle cases where multiple hyperparameter values lead to the same performance.
    print("Best hyperparameter value(s):")
    for metric, params in best_params.items():
        print(f"Metric={metric}: {sorted(params)}")
else:
    print("No hyperparameter tuning job trials found.")

Best hyperparameter value(s):
Metric=final_average_return: [{'batch-size': 16.0, 'steps-per-loop': 2.0}]


#### Convert a combination of best hyperparameter(s) for a metric of interest to JSON

In [230]:
# ! rm -rf $HPTUNING_RESULT_DIR
# ! mkdir $HPTUNING_RESULT_DIR
 
LOCAL_RESULTS_FILE = "result.json"  # {"batch-size": 8.0, "steps-per-loop": 2.0}

with open(LOCAL_RESULTS_FILE, "w") as f:
    json.dump(best_params["final_average_return"][0], f)

#### Upload the best hyperparameter(s) to GCS for use in training

In [231]:
!gsutil -q cp $LOCAL_RESULTS_FILE $HPTUNING_RESULT_URI

!gsutil ls $HPTUNING_RESULT_URI

gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-130012/hptuning/result.json


## Create custom prediction container

As with training, create a custom prediction container. This container handles the TF-Agents specific logic that is different from a regular TensorFlow Model. Specifically, it finds the predicted action using a trained policy. The associated source code is in `src/prediction/`.
See other options for Vertex AI predictions [here](https://cloud.google.com/vertex-ai/docs/predictions/getting-predictions).

#### Serve predictions:
- Use [`tensorflow.saved_model.load`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/policies/PolicySaver#usage), instead of [`tf_agents.policies.policy_loader.load`](https://github.com/tensorflow/agents/blob/r0.8.0/tf_agents/policies/policy_loader.py#L26), to load the trained policy, because the latter produces an object of type [`SavedModelPyTFEagerPolicy`](https://github.com/tensorflow/agents/blob/402b8aa81ca1b578ec1f687725d4ccb4115386d2/tf_agents/policies/py_tf_eager_policy.py#L137) whose `action()` is not compatible for use here.
- Note that prediction requests contain only observation data but not reward. This is because: The prediction task is a standalone request that doesn't require prior knowledge of the system state. Meanwhile, end users only know what they observe at the moment. Reward is a piece of information that comes after the action has been made, so the end users would not have knowledge of said reward. In handling prediction requests, you create a [`TimeStep`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/trajectories/TimeStep) object (consisting of `observation`, `reward`, `discount`, `step_type`) using the [`restart()`](https://www.tensorflow.org/agents/api_docs/python/tf_agents/trajectories/restart) function which takes in an `observation`. This function creates the *first* TimeStep in a trajectory of steps, where reward is 0, discount is 1 and step_type is marked as the first timestep. In other words, each prediction request forms the first `TimeStep` in a brand new trajectory.
- For the prediction response, avoid using NumPy-typed values; instead, convert them to native Python values using methods such as [`tolist()`](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.tolist.html) as opposed to `list()`.
- There exists a prestart script in `src/prediction`. FastAPI executes this script before starting up the server. The `PORT` environment variable is set to equal `AIP_HTTP_PORT` in order to run FastAPI on the same port expected by Vertex AI.

In [232]:
PRED_SUBFOLDER = 'prediction'

In [233]:
# Make the training subfolder
! rm -rf {REPO_DOCKER_PATH_PREFIX}/{PRED_SUBFOLDER}
! mkdir {REPO_DOCKER_PATH_PREFIX}/{PRED_SUBFOLDER}

In [234]:
%%writefile {REPO_DOCKER_PATH_PREFIX}/{PRED_SUBFOLDER}/main.py
# Copyright 2021 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Prediction server that uses a trained policy to give predicted actions."""
import os

from fastapi import FastAPI
from fastapi import Request

import tensorflow as tf
import tf_agents


app = FastAPI()
_model = tf.compat.v2.saved_model.load(os.environ["AIP_STORAGE_URI"])


@app.get(os.environ["AIP_HEALTH_ROUTE"], status_code=200)
def health():
    """
    Handles server health check requests.

    Returns:
      An empty dict.
    """
    return {}


@app.post(os.environ["AIP_PREDICT_ROUTE"])
async def predict(request: Request):
    """
    Handles prediction requests.

    Unpacks observations in prediction requests and queries the trained policy for
    predicted actions.

    Args:
      request: Incoming prediction requests that contain observations.

    Returns:
      A dict with the key `predictions` mapping to a list of predicted actions
      corresponding to each observation in the prediction request.
    """
    body = await request.json()
    instances = body["instances"]

    predictions = []
    for index, instance in enumerate(instances):
        # Unpack request body and reconstruct TimeStep. Rewards default to 0.
        batch_size = len(instance["observation"])
        
        time_step = tf_agents.trajectories.restart(
            observation=instance["observation"]
            , batch_size=tf.convert_to_tensor([batch_size])
        )
        policy_step = _model.action(time_step)

        predictions.append(
            {f"PolicyStep {index}": policy_step.action.numpy().tolist()}
        )

    return {
        "predictions": predictions
    }

Writing src/prediction/main.py


In [235]:
%%writefile {REPO_DOCKER_PATH_PREFIX}/{PRED_SUBFOLDER}/prestart.sh
#!/bin/bash
export PORT=$AIP_HTTP_PORT

Writing src/prediction/prestart.sh


#### Define dependencies

In [236]:
%%writefile pred_requirements.txt
tf-agents==0.17.0
tensorflow==2.12.0
numpy
six
typing-extensions
pillow

Overwriting pred_requirements.txt


#### Write a Dockerfile

Note: leave the server directory `app`.

In [237]:
DOCKERNAME = 'pred'

In [238]:
%%writefile Dockerfile_{DOCKERNAME}

FROM tiangolo/uvicorn-gunicorn-fastapi:python3.10

COPY src/prediction /app
COPY pred_requirements.txt /app/requirements.txt

RUN pip3 install -r /app/requirements.txt

Overwriting Dockerfile_pred


#### Build the prediction container with Cloud Build

In [239]:
PREDICTION_CONTAINER = "prediction-custom-container"

# Docker definitions for training
IMAGE_URI = f'gcr.io/{PROJECT_ID}/{PREDICTION_CONTAINER}'
MACHINE_TYPE ='e2-highcpu-32'
FILE_LOCATION = './'

print(f"export DOCKERNAME={DOCKERNAME}")
print(f"export IMAGE_URI={IMAGE_URI}")
print(f"export FILE_LOCATION={FILE_LOCATION}")
print(f"export MACHINE_TYPE={MACHINE_TYPE}")
print(f"export ARTIFACTS_DIR={ARTIFACTS_DIR}")

export DOCKERNAME=pred
export IMAGE_URI=gcr.io/hybrid-vertex/prediction-custom-container
export FILE_LOCATION=./
export MACHINE_TYPE=e2-highcpu-32
export ARTIFACTS_DIR=gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-130012/artifacts


In [240]:
# ! gcloud builds submit --config cloudbuild.yaml \
#     --substitutions _DOCKERNAME=$DOCKERNAME,_IMAGE_URI=$IMAGE_URI,_FILE_LOCATION=$FILE_LOCATION,_ARTIFACTS_DIR=$ARTIFACTS_DIR \
#     --timeout=2h \
#     --machine-type=$MACHINE_TYPE

## Submit custom container training job

- Note again that the bucket must be in the same regional location as the service location and it should not be multi-regional.
- Read more of CustomContainerTrainingJob's source code [here](https://github.com/googleapis/python-aiplatform/blob/v0.8.0/google/cloud/aiplatform/training_jobs.py#L2153).
- Like with local execution, you can use TensorBoard Profiler to track the training process and resources, and visualize the corresponding artifacts using the command: `%tensorboard --logdir $PROFILER_DIR`.

In [267]:
EXPERIMENT_NAME   = f'scale-perarm-hpt'

invoke_time       = time.strftime("%Y%m%d-%H%M%S")
RUN_NAME          = f'run-{invoke_time}'

LOG_DIR           = f"{BUCKET_URI}/{EXPERIMENT_NAME}/{RUN_NAME}/tb-logs"
ROOT_DIR          = f"{BUCKET_URI}/{EXPERIMENT_NAME}/{RUN_NAME}/root"       # Root directory for writing logs/summaries/checkpoints.
ARTIFACTS_DIR     = f"{BUCKET_URI}/{EXPERIMENT_NAME}/{RUN_NAME}/artifacts"  # Where the trained model will be saved and restored.

print(f"EXPERIMENT_NAME   : {EXPERIMENT_NAME}")
print(f"RUN_NAME          : {RUN_NAME}")
print(f"LOG_DIR           : {LOG_DIR}")
print(f"ROOT_DIR          : {ROOT_DIR}")
print(f"ARTIFACTS_DIR     : {ARTIFACTS_DIR}")

EXPERIMENT_NAME   : scale-perarm-hpt
RUN_NAME          : run-20230714-134801
LOG_DIR           : gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134801/tb-logs
ROOT_DIR          : gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134801/root
ARTIFACTS_DIR     : gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134801/artifacts


In [268]:
aiplatform.init(project=PROJECT_ID, location=REGION, staging_bucket=BUCKET_NAME)

In [269]:
RUN_HYPERPARAMETER_TUNING       = False  # Execute regular training instead of hyperparameter tuning.
TRAIN_WITH_BEST_HYPERPARAMETERS = True   # @param {type:"bool"} Whether to use learned hyperparameters in training

In [270]:
# Set hyperparameters.
BATCH_SIZE       = 8         # Training and prediction batch size.
TRAINING_LOOPS   = 100       # Number of training iterations.
STEPS_PER_LOOP   = 2         # Number of driver steps per training iteration.

In [271]:
args = [
    f"--data-path={DATA_PATH}"               # TODO - remove duplicate arg
    , f"--bucket_name={BUCKET_NAME}"
    , f"--data_gcs_prefix={DATA_GCS_PREFIX}"
    , f"--data_path={DATA_PATH}"
    , f"--project_number={PROJECT_NUM}"
    , f"--batch-size={BATCH_SIZE}"
    , f"--rank-k={RANK_K}"
    , f"--num-actions={NUM_ACTIONS}"
    , f"--tikhonov-weight={TIKHONOV_WEIGHT}"
    , f"--agent-alpha={AGENT_ALPHA}"
    , f"--training-loops={TRAINING_LOOPS}"
    , f"--steps-per-loop={STEPS_PER_LOOP}"
    , f"--distribute={DISTRIBUTE_STRATEGY}"
    , f"--artifacts_dir={ARTIFACTS_DIR}"
    , f"--root_dir={ROOT_DIR}"
]

if RUN_HYPERPARAMETER_TUNING:
    args.append("--run-hyperparameter-tuning")
elif TRAIN_WITH_BEST_HYPERPARAMETERS:
    args.append("--train-with-best-hyperparameters")
    args.append(f"--best-hyperparameters-bucket={BUCKET_NAME}")
    args.append(f"--best-hyperparameters-path={HPTUNING_RESULT_PATH}")
    
from pprint import pprint
pprint(f"args: {args}")

("args: ['--data-path=gs://mabv1-hybrid-vertex-bucket/data', "
 "'--bucket_name=mabv1-hybrid-vertex-bucket', '--data_gcs_prefix=data', "
 "'--data_path=gs://mabv1-hybrid-vertex-bucket/data', "
 "'--project_number=934903580331', '--batch-size=8', '--rank-k=20', "
 "'--num-actions=20', '--tikhonov-weight=0.001', '--agent-alpha=10.0', "
 "'--training-loops=100', '--steps-per-loop=2', '--distribute=single', "
 "'--artifacts_dir=gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134801/artifacts', "
 "'--root_dir=gs://mabv1-hybrid-vertex-bucket/scale-perarm-hpt/run-20230714-134801/root', "
 "'--train-with-best-hyperparameters', "
 "'--best-hyperparameters-bucket=mabv1-hybrid-vertex-bucket', "
 "'--best-hyperparameters-path=scale-perarm-hpt/run-20230714-130012/hptuning/result.json']")


In [272]:
job = aiplatform.CustomContainerTrainingJob(
    display_name=f"mvl-train-job-{RUN_NAME}"
    , container_uri=f"gcr.io/{PROJECT_ID}/{HPTUNING_TRAINING_CONTAINER}:latest"
    , command=["python3", "-m", "src.per_arm_rl.task"] + args  # Pass in training arguments, including hyperparameters.
    , model_serving_container_image_uri=f"gcr.io/{PROJECT_ID}/{PREDICTION_CONTAINER}:latest"
    , model_serving_container_predict_route="/predict"
    , model_serving_container_health_route="/health"
)

print("Training Spec:", job._managed_model)

Training Spec: container_spec {
  image_uri: "gcr.io/hybrid-vertex/prediction-custom-container:latest"
  predict_route: "/predict"
  health_route: "/health"
}



In [273]:
model = job.run(
    model_display_name = f"{PREFIX}-perarm-model"
    , replica_count = 1
    , machine_type = "n1-standard-16"
    , accelerator_type = "ACCELERATOR_TYPE_UNSPECIFIED"     # ACCELERATOR_TYPE
    # , tensorboard=TENSORBOARD                             # TODO
    , accelerator_count = 0
    , enable_web_access = True
    , restart_job_on_worker_restart = False
    , sync=False
)

In [275]:
print("Model display name:", model.display_name)
print("Model ID:", model.name)

Model display name: mabv1-perarm-model
Model ID: 7981005261328351232


### Deploy trained model to an Endpoint

In [276]:
endpoint = model.deploy(machine_type="n1-standard-4")

In [277]:
print("Endpoint display name:", endpoint.display_name)
print("Endpoint ID:", endpoint.name)

Endpoint display name: mabv1-perarm-model_endpoint
Endpoint ID: 3299374910211620864


### Predict on the Endpoint
- Put prediction input(s) into a list named `instances`. The observation should of dimension (BATCH_SIZE, RANK_K). Read more about the MovieLens simulation environment observation [here](https://github.com/tensorflow/agents/blob/v0.8.0/tf_agents/bandits/environments/movielens_py_environment.py#L32-L138).
- Read more about the endpoint prediction API [here](https://cloud.google.com/sdk/gcloud/reference/ai/endpoints/predict).

In [279]:
## TODO

# endpoint.predict(
#     instances=[
#         {"observation": [list(np.ones(20)) for _ in range(8)]},
#     ]
# )

## Cleaning up

To clean up all Google Cloud resources used in this project, you can [delete the Google Cloud
project](https://cloud.google.com/resource-manager/docs/creating-managing-projects#shutting_down_projects) you used for the tutorial.

Otherwise, you can delete the individual resources you created in this tutorial:

In [None]:
# # Delete endpoint resource
# ! gcloud ai endpoints delete $endpoint.name --quiet --region $REGION

# # Delete model resource
# ! gcloud ai models delete $model.name --quiet

# # Delete Cloud Storage objects that were created
# ! gsutil -m rm -r $ARTIFACTS_DIR