# Hyperparameter Tuning of Tennis Reinforcement Learning

This notebook determines the best set of hyperparameters to solve the Tennis enviornment as quickly as possible.

#### Import necessary modules

In [2]:
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

import boto3
from IPython.display import Image
from sagemaker import get_execution_role
from sagemaker.estimator import Estimator
from sagemaker.session import Session
from sagemaker.tuner import IntegerParameter, CategoricalParameter, ContinuousParameter, HyperparameterTuner
import shutil


#### Set local parameters

In [3]:
instance_type = 'ml.m5.large'
n_timesteps = 3000
n_instances = 1

#### Set the hyperparameters to optimize
The following hyperparameters are read from the command line in [train.py](container/src/train.py).

| Name        | Type  | Default | Description                        |
|-------------|-------|---------|------------------------------------|
| epochs      | int   |    2000 | number of total epochs to run      |
| max_t       | int   |    1000 | max number of time steps per epoch |
| fc1         | int   |     128 | size of 1st hidden layer           |
| fc2         | int   |      64 | size of 2bd hidden layer           |
| lr_actor    | float |   0.001 | initial learning rate for actor    |
| lr_critic   | float |   0.001 | initial learning rate for critic   |
| batch_size  | int   |     256 | mini batch size                    |
| buffer_size | int   |  100000 | replay buffer size                 |
| gamma       | float |     0.9 | discount factor                    |
| tau         | float |   0.001 | soft update of target parameters   |
| sigma       | float |    0.01 | OU Noise standard deviation        |

Any of these could be tuned but we will down select to limit the search.    

The hyperparameter tunner allow the hyperparameters to be defined as one of the following types. 
- `CategoricalParameter(list)` Categorical parameters need to take one value from a discrete set. 
- `ContinuousParameter(min, max)` Continuous parameters can take any real number value between the minimum and maximum value.
- `IntegerParameter(min, max)` Integer parameters can take any integer value between the minimum and maximum value.

_Note, if possible, it's almost always best to specify a value as the least restrictive type. For example, tuning learning rate as a continuous value between 0.01 and 0.2 is likely to yield a better result than tuning as a categorical parameter with values 0.01, 0.1, 0.15, or 0.2. Some parameters are categorical to maintain a power 2 structure and limit the search space._

In [6]:
hyperparameter_ranges = {
    'encoder_num_hidden': IntegerParameter(1, 10),
    'decoder_num_hidden': IntegerParameter(1, 10),
    'batch_size': IntegerParameter(1, 256),
    'learning_rate': ContinuousParameter(0.0001, 0.8),
    'epochs': IntegerParameter(1, 1000),
    }


#### Set the objective
Next we'll specify the objective metric that we'd like to tune and its definition, which includes the regular expression (Regex) needed to extract that metric from the CloudWatch logs of the training job. In this particular case, our script emits the total trainging episodes and we will use it as the objective metric, we also set the objective_type to be 'Minimize', so that hyperparameter tuning seeks to minize the objective metric when searching for the best hyperparameter setting. By default, objective_type is set to 'Maximize'.

In [7]:
objective_metric_name = 'time to solve'
objective_type = 'Minimize'
metric_definitions = [{'Name': objective_metric_name,
                       'Regex': '(\S+) training objective.'}]

#### Compile environment

In [8]:
role = get_execution_role()
account = boto3.client('sts').get_caller_identity()['Account']
region = boto3.Session().region_name
image_name = '{}.dkr.ecr.{}.amazonaws.com/rl-portfolio-optimization:latest'.format(account, region)
print(image_name)

662572584943.dkr.ecr.us-east-1.amazonaws.com/rl-portfolio-optimization:latest


#### Create the base estimator

In [12]:
estimator = Estimator(role=role,
                  instance_count=n_instances,
                  instance_type=instance_type,
                  image_uri=image_name)

#### Create the hyperparameter tuner object

In [13]:
tuner = HyperparameterTuner(estimator,
                            objective_metric_name,
                            hyperparameter_ranges,
                            metric_definitions,
                            max_jobs=10,
                            max_parallel_jobs=5,
                            objective_type=objective_type)

#### Perform the hyperparameter tuning
After the hyperprameter tuning job is created, you should be able to describe the tuning job to see its progress in the next step, and you can go to SageMaker console -> `Training` -> `Hyperparameter tuning jobs` to see the progresss.

In [14]:
tuner.fit()
tuner.wait()

.......................................................................................................................!
!


In [15]:
best_parameters = tuner.best_estimator().hyperparameters()



2021-09-26 23:51:51 Starting - Preparing the instances for training
2021-09-26 23:51:51 Downloading - Downloading input data
2021-09-26 23:51:51 Training - Training image download completed. Training in progress.
2021-09-26 23:51:51 Uploading - Uploading generated training model
2021-09-26 23:51:51 Completed - Training job completed


In [16]:
best_name = tuner.best_training_job()
print('\nBest model = {}.'.format(best_name))
for name in hyperparameter_ranges.keys():
    print('\t{} = {}'.format(name, best_parameters[name]))


Best model = rl-portfolio-optimiz-210926-2347-004-43ddb435.
	encoder_num_hidden = 10
	decoder_num_hidden = 1
	batch_size = 256
	learning_rate = 0.004326865191598946
	epochs = 1


#### Compare optimal hyperparameters to manually optimized values.
| Name        | Type  | Default | Description                        |
|-------------|-------|---------|------------------------------------|
| fc1         | int   |     128 | size of 1st hidden layer           |
| fc2         | int   |      64 | size of 2bd hidden layer           |
| lr_actor    | float |   0.001 | initial learning rate for actor    |
| lr_critic   | float |   0.001 | initial learning rate for critic   |
| batch_size  | int   |     256 | mini batch size                    |
| buffer_size | int   |  100000 | replay buffer size                 |
| gamma       | float |     0.9 | discount factor                    |
| tau         | float |   0.001 | soft update of target parameters   |
| sigma       | float |    0.01 | OU Noise standard deviation        |