<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.</i>
<br><br>
# Recommender Hyperparameter Tuning w/ AzureML

This notebook shows how to auto-tune hyperparameters of a recommender model by utilizing **Azure Machine Learning service**<sup>[a](#azureml-search), [b](#azure-subscription)</sup> ([AzureML](https://azure.microsoft.com/en-us/services/machine-learning-service/)).

We present an overall process of utilizing AzureML, specifically [**Hyperdrive**](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py) component, for the hyperparameter tuning by demonstrating key steps:
1. Configure AzureML Workspace
2. Create Remote Compute Target (GPU cluster)
3. Prepare Data
4. Prepare Training Scripts
5. Setup and Run Hyperdrive Experiment
6. Model Import, Re-train and Test

In this notebook, we use [**Wide-and-Deep model**](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html) from **TensorFlow high-level Estimator API (v1.12)** on the movie recommendation scenario. Wide-and-Deep learning jointly trains wide linear model and deep neural networks (DNN) to combine the benefits of memorization and generalization for recommender systems.

For more details about the **Wide-and-Deep** model:
* [Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb)
* [Original paper](https://arxiv.org/abs/1606.07792)
* [TensorFlow API doc](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedRegressor)
  
Regarding **AuzreML**, please refer:
* [Quickstart notebook](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
* [Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters)
* [Tensorflow model tuning with Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-tensorflow)

> <span id="azureml-search">a. </span>To use AzureML, you will need an Azure subscription.  
<span id="azure-subscription">b. </span>When you web-search "Azure Machine Learning", you will most likely to see mixed results of Azure Machine Learning (AzureML) and Azure Machine Learning **Studio**. Please note they are different services where AzureML's focuses are on ML model management, tracking and hyperparameter tuning, while the [ML Studio](https://studio.azureml.net/)'s is to provide a high-level tool for 'easy-to-use' experience of ML designing and experimentation based on GUI.   

In [29]:
import sys
sys.path.append("../../")

import itertools
import os
import shutil
import time

from IPython.display import clear_output
import numpy as np
import papermill as pm
import pandas as pd
import sklearn.preprocessing
import tensorflow as tf

import azureml as aml
import azureml.widgets as widgets
import azureml.train.hyperdrive as hd

from reco_utils.dataset.pandas_df_utils import user_item_pairs
from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_random_split
import reco_utils.evaluation.python_evaluation

print("Azure ML SDK Version:", aml.core.VERSION)
print("Tensorflow Version:", tf.__version__)

Azure ML SDK Version: 1.0.10
Tensorflow Version: 1.12.0


### 1. Configure AzureML Workspace
**AzureML workspace** is a foundational block in the cloud that you use to experiment, train, and deploy machine learning models via AzureML service. In this notebook, we 1) create a workspace from [**Azure portal**](https://portal.azure.com) and 2) configure from this notebook.

You can find more details about the setup and configure processes from the following links:
* [Quickstart with Azure portal](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started)
* [Quickstart with Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
  
<br>
  
#### 1.1 Create a workspace
1. Sign in to the [Azure portal](https://portal.azure.com) by using the credentials for the Azure subscription you use.
2. Select **Create a resource** menu, search for **Machine Learning service workspace** select **Create** button.
3. In the **ML service workspace** pane, configure your workspace with entering the *workspace name* and *resource group* (or **create new** resource group if you don't have one already), and select **Create**. It can take a few moments to create the workspace.
  
<br>
  
#### 1.2 Configure
To configure this notebook to communicate with the workspace, type in your Azure subscription id, the resource group name and workspace name to `<subscription-id>`, `<resource-group>`, `<workspace-name>` in the above notebook cell. Alternatively, you can create a *.\aml_config\config.json* file with the following contents:
```
{
    "subscription_id": "<subscription-id>",
    "resource_group": "<resource-group>",
    "workspace_name": "<workspace-name>"
}
```


In [2]:
# AzureML workspace info. Note, will look up "aml_config\config.json" first, then fall back to use this
SUBSCRIPTION_ID = '<subscription-id>'
RESOURCE_GROUP  = '<resource-group>'
WORKSPACE_NAME  = '<workspace-name>'

# Remote compute (cluster) configuration. If you want to save the cost more, set these to small.
VM_SIZE = 'STANDARD_NC6'
VM_PRIORITY = 'lowpriority'
# Cluster nodes
MIN_NODES = 4
MAX_NODES = 8
# Hyperdrive experimentation configuration
MAX_TOTAL_RUNS = 100  # Number of runs (training-and-evaluation) to search the best hyperparameters. 
MAX_CONCURRENT_RUNS = 4

# Recommend top k items
TOP_K = 10
# Select Movielens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'
EPOCHS = 50
# Metrics to track
RANKING_METRICS = ['ndcg_at_k', 'precision_at_k']
RATING_METRICS = ['rmse', 'mae']
PRIMARY_METRIC = 'rmse'
# Data column names
USER_COL = 'UserId'
ITEM_COL = 'MovieId'
RATING_COL = 'Rating'
ITEM_FEAT_COL = 'Genres'

Now let's see if everything is ready!

In [10]:
# Connect to a workspace
try:
    ws = aml.core.Workspace.from_config()
except aml.exceptions.UserErrorException:
    try:
        ws = aml.core.Workspace(
            subscription_id=SUBSCRIPTION_ID,
            resource_group=RESOURCE_GROUP,
            workspace_name=WORKSPACE_NAME
        )
        ws.write_config()
    except aml.exceptions.AuthenticationException:
        ws = None

if ws is None:
    raise ValueError(
        """Cannot access the AzureML workspace w/ the config info provided.
        Please check if you entered the correct id, group name and workspace name"""
    )
else:
    print("AzureML workspace name: ", ws.name)
    clear_output()  # Comment out this if you want to see your workspace info.

### 2. Create Remote Compute Target

We create a gpu cluster as our **remote compute target**. If a cluster with the same name is already exist in your workspace, the script will load it instead. You can see [this document](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) to learn more about setting up a compute target on different locations.

This notebook selects **STANDARD_NC6** virtual machine (VM) and sets it's priority as *lowpriority* to save the cost.

Size | vCPU | Memory (GiB) | Temp storage (SSD, GiB) | GPU | GPU memory (GiB) | Max data disks | Max NICs
---|---|---|---|---|---|---|---
Standard_NC6 | <div align="center">6</div> | <div align="center">56</div> | <div align="center">340</div> | <div align="center">1</div> | <div align="center">8</div> | <div align="center">24</div> | <div align="center">1</div>


For more information about Azure virtual machine sizes, see [here](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu).

In [11]:
CLUSTER_NAME = 'gpu-cluster-nc6'

try:
    compute_target = aml.core.compute.ComputeTarget(workspace=ws, name=CLUSTER_NAME)
    print("Found existing compute target")
except aml.core.compute_target.ComputeTargetException:
    print("Creating a new compute target...")
    compute_config = aml.core.compute.AmlCompute.provisioning_configuration(
        vm_size=VM_SIZE,
        vm_priority=VM_PRIORITY,
        min_nodes=MIN_NODES,
        max_nodes=MAX_NODES
    )
    # create the cluster
    compute_target = aml.core.compute.ComputeTarget.create(ws, CLUSTER_NAME, compute_config)
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# Use the 'status' property to get a detailed status for the current cluster. 
print(compute_target.status.serialize())

Found existing compute target
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-02-15T21:04:05.699000+00:00', 'creationTime': '2019-02-04T16:59:49.711395+00:00', 'currentNodeCount': 4, 'errors': None, 'modifiedTime': '2019-02-04T17:00:42.840716+00:00', 'nodeStateCounts': {'idleNodeCount': 1, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 3, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 4, 'maxNodeCount': 8, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 4, 'vmPriority': 'LowPriority', 'vmSize': 'STANDARD_NC6'}


### 3. Prepare Data
For demonstration purpose, we use 100k MovieLens dataset. First, download the data and convert the format (multi-hot encode *genres*) to make it work for our model. More details about this step is described in our [Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb).

In [12]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=[USER_COL, ITEM_COL, RATING_COL],
    genres_col='Genres_string'
)

# Encode 'genres' into int array (multi-hot representation) to use as item features
genres_encoder = sklearn.preprocessing.MultiLabelBinarizer()
data[ITEM_FEAT_COL] = genres_encoder.fit_transform(
    data['Genres_string'].apply(lambda s: s.split("|"))
).tolist()
data.drop('Genres_string', axis=1, inplace=True)

data.head()

Unnamed: 0,UserId,MovieId,Rating,Genres
0,196,242,3.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
1,63,242,3.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,226,242,5.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,154,242,3.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
4,306,242,5.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


The dataset is split into train, validation, and test sets. The train and validation sets will be used for hyperparameter tuning, and the test set will be used for the final evaluation of the model after we import the best model from AzureML workspace.

Here, we don't use multiple-split directly by passing `ratio=[0.56, 0.19, 0.25]`. Instead, we first split the data into train and test sets with the same `seed` we've been using in other notebooks to make the train set identical across them. Then, we further split the train set into train and validation sets.

In [13]:
# Use the same seed to make the train and test sets identical across other notebooks in the repo.
train, test = python_random_split(data, ratio=0.75, seed=42)
# Further split the train set into train and validation set.
train, valid = python_random_split(train)

print(len(train), len(valid), len(test))

56250 18750 25000


Now, upload the train and validation sets to the AzureML workspace. Our Hyperdrivce experiment will use them.

In [14]:
DATA_DIR = 'aml_data'

os.makedirs(DATA_DIR, exist_ok=True)

TRAIN_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_train.pkl"
train.to_pickle(os.path.join(DATA_DIR, TRAIN_FILE_NAME))
VALID_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_valid.pkl"
valid.to_pickle(os.path.join(DATA_DIR, VALID_FILE_NAME))

# Note, all the files under DATA_DIR will be uploaded to the data store
ds = ws.get_default_datastore()
ds.upload(
    src_dir=DATA_DIR,
    target_path='data',
    overwrite=True,
    show_progress=True
)

Uploading aml_data/movielens_100k_train.pkl
Uploading aml_data/movielens_100k_valid.pkl
Uploaded aml_data/movielens_100k_valid.pkl, 1 files out of an estimated total of 2
Uploaded aml_data/movielens_100k_train.pkl, 2 files out of an estimated total of 2


$AZUREML_DATAREFERENCE_15ad7315d3d84f6b83fa383208c416f7

### 4. Prepare Training Scripts
Next step is to prepare scripts that AzureML Hyperdrive will use to train and evaluate models with selected hyperparameters. We re-use our [Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb) for that. To run the model notebook from the Hyperdrive Run, all we need is to prepare an [entry script](../../reco_utils/aml/wide_deep.py) which parses the hyperparameter arguments, passes them to the notebook, and records the results of the notebook to AzureML Run logs by using `papermill`. Hyperdrive uses the logs to track the performance of each hyperparameter-set and finds the best performed one.  

Here is a code snippet from the [entry script](../../reco_utils/aml/wide_deep.py):
```
import argparse
import papermill as pm
from azureml.core import Run
run = Run.get_context()
...
parser = argparse.ArgumentParser()
...
parser.add_argument('--dnn-optimizer', type=str, dest='dnn_optimizer', ...
parser.add_argument('--dnn-optimizer-lr', type=float, dest='dnn_optimizer_lr', ...
...
pm.execute_notebook(
    "../../notebooks/00_quick_start/wide_deep_movielens.ipynb",
    OUTPUT_NOTEBOOK,
    parameters=params,
    kernel_name='python3',
)
...
```

In [15]:
# Prepare all the necessary scripts which will be loaded to our Hyperdrive Experiment Run
SCRIPT_DIR = 'aml_script'

# Clean-up scripts if already exists
shutil.rmtree(SCRIPT_DIR, ignore_errors=True)

# Copy scripts to SCRIPT_DIR temporarly
shutil.copytree(os.path.join('..', '..', 'reco_utils'), os.path.join(SCRIPT_DIR, 'reco_utils'))

# We re-use our model notebook for training and testing models.
model_notebook_dir = os.path.join('notebooks', '00_quick_start')
dest_model_notebook_dir = os.path.join(SCRIPT_DIR, model_notebook_dir)
os.makedirs(dest_model_notebook_dir , exist_ok=True)
shutil.copy(
    os.path.join('..', '..', model_notebook_dir, 'wide_deep_movielens.ipynb'),
    dest_model_notebook_dir
)

# This is our entry script for Hyperdrive Run
ENTRY_SCRIPT_NAME = 'reco_utils/aml/wide_deep.py'

### 5. Setup and Run Hyperdrive Experiment
[Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters) create a machine learning Experiment [Run](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run?view=azure-ml-py) on the workspace and utilizes child-runs to search the best set of hyperparameters.

<br>

#### 5.1 Create Experiment 
[Experiment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment(class)?view=azure-ml-py) is the main entry point into experimenting with AzureML. To create new Experiment or get the existing one, we pass our experimentation name.

In [16]:
# Create an experiment to track the runs in the workspace
EXP_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_wide_deep_model"
exp = aml.core.Experiment(workspace=ws, name=EXP_NAME)

#### 5.2 Define Search Space 
Now we define the search space of hyperparameters. For example, if you want to test different batch sizes of {64, 128, 256}, you can use `azureml.train.hyperdrive.choice(64, 128, 256)`. To search from a continuous space, use `uniform(start, end)`. For more options, see [Hyperdrive parameter expressions](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.parameter_expressions?view=azure-ml-py).
In this notebook, we fix model type as `wide_deep` and the number of epochs to 50.

In the search space, we set different linear and DNN optimizers, structures, learning rates and regularization rates. Details about the hyperparameters can be found from our [Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb).

> Hyperdrive provides three different parameter sampling methods: `RandomParameterSampling`, `GridParameterSampling`, and `BayesianParameterSampling`. Details about each method can be found from [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the Bayesian sampling.

In [17]:
# Fixed parameters
script_params = {
    '--datastore': ds.as_mount(),
    '--train-datapath': "data/" + TRAIN_FILE_NAME,
    '--test-datapath': "data/" + VALID_FILE_NAME,
    '--top-k': TOP_K,
    '--user-col': USER_COL,
    '--item-col': ITEM_COL,
    '--item-feat-col': ITEM_FEAT_COL,
    '--rating-col': RATING_COL,
    '--ranking-metrics': RANKING_METRICS,
    '--rating-metrics': RATING_METRICS,
    '--epochs': EPOCHS,
    '--model-type': 'wide_deep'
}

# Hyperparameter search space
params = {
    '--batch-size': hd.choice(64, 128, 256),
    # Linear model hyperparameters
    '--linear-optimizer': hd.choice('Ftrl'),  # 'SGD' and 'Momentum' easily got exploded loss in regression problems.
    '--linear-optimizer-lr': hd.uniform(0.0001, 0.1),
    '--linear-l1-reg': hd.uniform(0.0, 0.1),
    # Deep model hyperparameters
    '--dnn-optimizer': hd.choice('Adagrad', 'Adam'),
    '--dnn-optimizer-lr': hd.uniform(0.0001, 0.1),
    '--dnn-user-embedding-dim': hd.choice(4, 8, 16, 32, 64),
    '--dnn-item-embedding-dim': hd.choice(4, 8, 16, 32, 64),
    '--dnn-hidden-layer-1': hd.choice(0, 32, 64, 128, 256, 512, 1024),  # 0: not using this layer
    '--dnn-hidden-layer-2': hd.choice(0, 32, 64, 128, 256, 512, 1024),
    '--dnn-hidden-layer-3': hd.choice(0, 32, 64, 128, 256, 512, 1024),
    '--dnn-hidden-layer-4': hd.choice(32, 64, 128, 256, 512, 1024),
    '--dnn-batch-norm': hd.choice(0, 1),
    '--dnn-dropout': hd.choice(0.0, 0.1, 0.2, 0.3, 0.4)
}

**AzureML Estimator** is the building block for training. An Estimator encapsulates the training code and parameters, the compute resources and runtime environment for a particular training scenario (Note, this is not TensorFlow's Estimator)

We create one for our experimentation with the dependencies our model requires as follows:
```
conda_packages=['pandas', 'scikit-learn', 'tensorflow-gpu=1.12'],
pip_packages=['ipykernel', 'papermill']
```

To the Hyperdrive Run Config, we set our primary metric name and the goal (our hyperparameter search criteria), hyperparameter sampling method, and number of total child-runs. The bigger the search space, the more number of runs we will need for better results.

In [18]:
est = aml.train.estimator.Estimator(
    source_directory=SCRIPT_DIR,
    entry_script=ENTRY_SCRIPT_NAME,
    script_params=script_params,
    compute_target=compute_target,
    use_gpu=True,
    conda_packages=['pandas', 'scikit-learn', 'tensorflow-gpu=1.12'],
    pip_packages=['ipykernel', 'papermill']
)

hd_run_config = hd.HyperDriveRunConfig(
    estimator=est, 
    hyperparameter_sampling=hd.BayesianParameterSampling(params),
    primary_metric_name=PRIMARY_METRIC,
    primary_metric_goal=hd.PrimaryMetricGoal.MINIMIZE, 
    max_total_runs=MAX_TOTAL_RUNS,
    max_concurrent_runs=MAX_CONCURRENT_RUNS
)

#### 5.3 Run Experiment

Now we submit the Run to our experiment. You can see the experiment progress from this notebook by using `azureml.widgets.RunDetails(hd_run).show()` or check from the Azure portal with the url link you can get by running `hd_run.get_portal_url()`.

<img src="https://recodatasets.blob.core.windows.net/images/aml_0.png?sanitize=true"/>
<img src="https://recodatasets.blob.core.windows.net/images/aml_1.png?sanitize=true"/>
<center><i>AzureML Hyperdrive Widget</i></center>

To load an existing Hyperdrive Run instead of start new one, use `hd_run = hd.HyperDriveRun(exp, <user-run-id>, hyperdrive_run_config=hd_run_config)`. You also can cancel the Run with `hd_run.cancel()`.

In [22]:
hd_run = exp.submit(config=hd_run_config)
widgets.RunDetails(hd_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

Once all the child-runs are finished, we can get the best run and the metrics.

In [23]:
# Get best run and printout metrics
best_run = hd_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

In [24]:
print("* Best Run Id:", best_run.id)
print("\n* Best hyperparameters:")
print("Model type =", best_run_metrics['MODEL_TYPE'])
print("Batch size =", best_run_metrics['BATCH_SIZE'])
print("Linear optimizer =", best_run_metrics['LINEAR_OPTIMIZER'])
print("\tLearning rate = {0:.4f}".format(best_run_metrics['LINEAR_OPTIMIZER_LR']))
print("\tL1 regularization = {0:.4f}".format(best_run_metrics['LINEAR_L1_REG']))
print("DNN optimizer =", best_run_metrics['DNN_OPTIMIZER'])
print("\tUser embedding dimension =", best_run_metrics['DNN_USER_DIM'])
print("\tItem embedding dimension =", best_run_metrics['DNN_ITEM_DIM'])
hidden_units = []
for i in range(1, 5):
    hidden_nodes = best_run_metrics['DNN_HIDDEN_LAYER_{}'.format(i)]
    if hidden_nodes > 0:
        hidden_units.append(hidden_nodes)
print("\tHidden units =", hidden_units)
print("\tLearning rate = {0:.4f}".format(best_run_metrics['DNN_OPTIMIZER_LR']))
print("\tDropout rate = {0:.4f}".format(best_run_metrics['DNN_DROPOUT']))
print("\tBatch normalization =", best_run_metrics['DNN_BATCH_NORM'])
# Metrics evaluated on validation set
print("\n* Performance metrics:")
print("Top", TOP_K)
for m in RANKING_METRICS:
    print("\t{0} = {1:.4f}".format(m, best_run_metrics[m]))
for m in RATING_METRICS:
    print("\t{0} = {1:.4f}".format(m, best_run_metrics[m]))


* Best Run Id: movielens_100k_wide_deep_model_1550249121534_87

* Best hyperparameters:
Model type = wide_deep
Batch size = 64.0
Linear optimizer = Ftrl
	Learning rate = 0.0029
	L1 regularization = 0.0000
DNN optimizer = Adagrad
	User embedding dimension = 4.0
	Item embedding dimension = 4.0
	Hidden units = [128.0, 256.0, 32.0]
	Learning rate = 0.1000
	Dropout rate = 0.4000
	Batch normalization = 1.0

* Performance metrics:
Top 10
	ndcg_at_k = 0.0280
	precision_at_k = 0.0284
	rmse = 0.9410
	mae = 0.7427


### 6. Model Import and Test

[Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb), which we've used in our Hyperdrive Experiment, exports the trained model to the output folder (the output path is recorded at `best_run_metrics['saved_model_dir']`). We can download a model from the best run and test it. 

In [25]:
MODEL_DIR = 'aml_model'

os.makedirs(MODEL_DIR, exist_ok=True)
model_file_dir = os.path.normpath(best_run_metrics['saved_model_dir'][2:-1]) + '/'
print(model_file_dir)
for f in best_run.get_file_names():
    if f.startswith(model_file_dir):
        output_file_path = os.path.join(MODEL_DIR, f[len(model_file_dir):])
        print("Downloading {}..".format(f))
        best_run.download_file(name=f, output_file_path=output_file_path)
    
saved_model = tf.contrib.estimator.SavedModelEstimator(MODEL_DIR)

outputs/model/1550262605/
Downloading outputs/model/1550262605/saved_model.pb..
Downloading outputs/model/1550262605/variables/variables.data-00000-of-00002..
Downloading outputs/model/1550262605/variables/variables.data-00001-of-00002..
Downloading outputs/model/1550262605/variables/variables.index..
INFO:tensorflow:Using default config.
INFO:tensorflow:Using config: {'_model_dir': '/tmp/tmpd4k3p8fs', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': allow_soft_placement: true
graph_options {
  rewrite_options {
    meta_optimizer_iterations: ONE
  }
}
, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f220e098208>, '_task_type': '

In [33]:
cols = {
    'col_user': USER_COL,
    'col_item': ITEM_COL,
    'col_rating': RATING_COL,
    'col_prediction': 'prediction'
}

tf.logging.set_verbosity(tf.logging.ERROR)

# Prediction input function for TensorFlow SavedModel
def predict_input_fn(df):
    def input_fn():
        examples = [None] * len(df)
        for index, test_sample in df.iterrows():
            example = tf.train.Example()

            example.features.feature[USER_COL].int64_list.value.extend([test_sample[USER_COL]])
            example.features.feature[ITEM_COL].int64_list.value.extend([test_sample[ITEM_COL]])
            example.features.feature[ITEM_FEAT_COL].float_list.value.extend(test_sample[ITEM_FEAT_COL])

            examples[index] = example.SerializeToString()
        return {'inputs': tf.constant(examples)}
    return input_fn

In [34]:
# Rating prediction set
X_test = test.drop(RATING_COL, axis=1)
X_test.reset_index(drop=True, inplace=True)

# Rating prediction
predictions = list(itertools.islice(
    saved_model.predict(predict_input_fn(X_test)),
    len(X_test)
))

prediction_df = X_test.copy()
prediction_df['prediction'] = [p['outputs'][0] for p in predictions]
print(prediction_df['prediction'].describe(), "\n")
for m in RATING_METRICS:
    fn = getattr(reco_utils.evaluation.python_evaluation, m)
    result = fn(test, prediction_df, **cols)
    print(m, "=", result)

count    25000.000000
mean         3.481071
std          0.661059
min         -0.752411
25%          3.038352
50%          3.597327
75%          3.969836
max          5.707587
Name: prediction, dtype: float64 

rmse = 0.9509258361717235
mae = 0.7490980247072131


In [35]:
# Unique items
if ITEM_FEAT_COL is None:
    items = data.drop_duplicates(ITEM_COL)[[ITEM_COL]].reset_index(drop=True)
else:
    items = data.drop_duplicates(ITEM_COL)[[ITEM_COL, ITEM_FEAT_COL]].reset_index(drop=True)
# Unique users
users = data.drop_duplicates(USER_COL)[[USER_COL]].reset_index(drop=True)

# Ranking prediction set
ranking_pool = user_item_pairs(
    user_df=users,
    item_df=items,
    user_col=USER_COL,
    item_col=ITEM_COL,
    user_item_filter_df=pd.concat([train, valid]),  # remove seen items
    shuffle=True
)

In [36]:
predictions = []
# To prevent creating a tensor proto whose content is larger than 2GB (which will raise an error),
# divide ranking_pool into 10 chunks, predict each, and concat back. 
for pool in np.array_split(ranking_pool, 10):
    pool.reset_index(drop=True, inplace=True)
    # Rating prediction
    pred = list(itertools.islice(
        saved_model.predict(predict_input_fn(pool)),
        len(pool)
    ))
    predictions.extend([p['outputs'][0] for p in pred])
    
ranking_pool['prediction'] = predictions

for m in RANKING_METRICS:
    fn = getattr(reco_utils.evaluation.python_evaluation, m)
    result = fn(test, ranking_pool, **{**cols, 'k': TOP_K})
    print(m, "=", result)

ndcg_at_k = 0.032210326246892636
precision_at_k = 0.03450106157112528


#### <span id="google-wide-deep-baseline">Wide-and-Deep Baseline Comparison</span>
To see if Hyperdrive found good hyperparameters, we simply compare with the model with known hyperparameters from [TensorFlow's wide-deep learning example](https://github.com/tensorflow/models/blob/master/official/wide_deep/movielens_main.py) which uses only the DNN part from the wide-and-deep model for MovieLens data.

> Note, this is not 'apples to apples' comparison. For example, TensorFlow's movielens example uses *rating-timestamp* as a numeric feature, but we did not use that here because we think the timestamps are not relevant to the movies' ratings. This comparison is more like to show how Hyperdrive can help to find comparable hyperparameters without requiring exhaustive efforts in searching through a huge space. 

In [39]:
OUTPUT_NOTEBOOK = "output.ipynb"
OUTPUT_MODEL_DIR = "known_hyperparam_model_checkpoints"
params = {
    'MOVIELENS_DATA_SIZE': MOVIELENS_DATA_SIZE,
    'TOP_K': TOP_K,
    'MODEL_TYPE': 'deep',
    'EPOCHS': EPOCHS,
    'BATCH_SIZE': 256,
    'DNN_OPTIMIZER': 'Adam',
    'DNN_OPTIMIZER_LR': 0.001,
    'DNN_HIDDEN_LAYER_1': 256,
    'DNN_HIDDEN_LAYER_2': 256,
    'DNN_HIDDEN_LAYER_3': 256,
    'DNN_HIDDEN_LAYER_4': 128,
    'DNN_USER_DIM': 16,
    'DNN_ITEM_DIM': 64,
    'DNN_DROPOUT': 0.3,
    'DNN_BATCH_NORM': 0,
    'MODEL_DIR': OUTPUT_MODEL_DIR,
    'EVALUATE_WHILE_TRAINING': False,
    'EXPORT_DIR_BASE': OUTPUT_MODEL_DIR,
    'RANKING_METRICS': RANKING_METRICS,
    'RATING_METRICS': RATING_METRICS,
}

start_time = time.time()
pm.execute_notebook(
    "../00_quick_start/wide_deep_movielens.ipynb",
    OUTPUT_NOTEBOOK,
    parameters=params,
    kernel_name='python3'
)
end_time = time.time()
print("Training and evaluation of Wide-and-Deep model took", end_time-start_time, "secs.")

nb = pm.read_notebook(OUTPUT_NOTEBOOK)
for m in RANKING_METRICS:
    print(m, "=", nb.data[m])
for m in RATING_METRICS:
    print(m, "=", nb.data[m])
    
os.remove(OUTPUT_NOTEBOOK)
shutil.rmtree(OUTPUT_MODEL_DIR, ignore_errors=True)

HBox(children=(IntProgress(value=0, max=31), HTML(value='')))


Training and evaluation of Wide-and-Deep model took 167.64200687408447 secs.
ndcg_at_k = 0.013301650439178084
precision_at_k = 0.014755838641188962
rmse = 0.9965956623715038
mae = 0.7935867070198059


### Concluding Remark
We showed how to tune hyperparameters by utilizing Azure Machine Learning service. Complex and powerful models like Wide-and-Deep model often have many number of hyperparameters that affect on the recommendation accuracy, and it is not practical to tune the model without using a GPU cluster. For example, a training and evaluation of a model took around 3 minutes on 100k MovieLens data on a single *Standard NC6* VM as we tested from the [above cell](#google-wide-deep-baseline). When we used 1M MovieLens, it took about 47 minutes. If we want to investigate through 100 different combinations of hyperparameters **manually**, it will take **78 hours** on the VM and we may still wonder if we had tested good candidates of hyperparameters. With AzureML, as we shown in this notebook, we can easily setup different size of GPU cluster fits to our problem and utilize Bayesian sampling to navigate through the huge search space efficiently, and tweak the experiment with different criteria and algorithms for further research.

#### Cleanup

In [40]:
shutil.rmtree(SCRIPT_DIR, ignore_errors=True)
shutil.rmtree(DATA_DIR, ignore_errors=True)
shutil.rmtree(MODEL_DIR, ignore_errors=True)