<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.</i>
<br>
# Wide-and-Deep Model Hyperparameter Tuning with AzureML

This notebook shows how to auto-tune hyperparameters of a recommender model by utilizing **Azure Machine Learning service** ([AzureML](https://azure.microsoft.com/en-us/services/machine-learning-service/))<sup><a href="#azureml-search">a</a>, <a href="#azure-subscription">b</a></sup>.

We present an overall process of utilizing AzureML, specifically [**Hyperdrive**](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive?view=azure-ml-py) component, for the hyperparameter tuning by demonstrating key steps:
1. Configure AzureML Workspace
2. Create Remote Compute Target (GPU cluster)
3. Prepare Data
4. Prepare Training Scripts
5. Setup and Run Hyperdrive Experiment
6. Model Import, Re-train and Test

In this notebook, we use [**Wide-and-Deep model**](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html) from **TensorFlow high-level Estimator API (v1.12 or higher)** on the movie recommendation scenario. Wide-and-Deep learning jointly trains wide linear model and deep neural networks (DNN) to combine the benefits of memorization and generalization for recommender systems.

For more details about the **Wide-and-Deep** model:
* [Wide-and-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb)
* [Original paper](https://arxiv.org/abs/1606.07792)
* [TensorFlow API doc](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedRegressor)
  
Regarding **AuzreML**, please refer:
* [Quickstart notebook](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
* [Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters)
* [Tensorflow model tuning with Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-tensorflow)

---
<sub><span id="azureml-search">a. To use AzureML, you will need an Azure subscription.</span><br>
<span id="azure-subscription">b. When you web-search "Azure Machine Learning", you will most likely to see mixed results of Azure Machine Learning (AzureML) and Azure Machine Learning **Studio**. Please note they are different services where AzureML's focuses are on ML model management, tracking and hyperparameter tuning, while the [ML Studio](https://studio.azureml.net/)'s is to provide a high-level tool for 'easy-to-use' experience of ML designing and experimentation based on GUI.</span></sub>

In [1]:
%reload_ext autoreload
%autoreload 2

In [2]:
import sys

import itertools
import os
import shutil
from tempfile import TemporaryDirectory

from IPython.display import clear_output
import numpy as np
import papermill as pm
import pandas as pd
import sklearn.preprocessing
import tensorflow as tf

import azureml as aml
import azureml.widgets as widgets
import azureml.train.hyperdrive as hd

from reco_utils.common.timer import Timer
from reco_utils.common.constants import SEED
from reco_utils.common.tf_utils import pandas_input_fn_for_saved_model
from reco_utils.dataset import movielens
from reco_utils.dataset.pandas_df_utils import user_item_pairs
from reco_utils.dataset.python_splitters import python_random_split
import reco_utils.evaluation.python_evaluation as evaluator

print("Azure ML SDK Version:", aml.core.VERSION)
print("Tensorflow Version:", tf.__version__)

# Temp dir to cache temporal files while running this notebook
tmp_dir = TemporaryDirectory()

Azure ML SDK Version: 1.0.10
Tensorflow Version: 1.12.0


### 1. Create and Configure AzureML Workspace
**AzureML workspace** is a foundational block in the cloud that you use to experiment, train, and deploy machine learning models via AzureML service. In this notebook, we 1) create a workspace from [**Azure portal**](https://portal.azure.com) and 2) configure from this notebook.

You can find more details about the setup and configure processes from the following links:
* [Quickstart with Azure portal](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started)
* [Quickstart with Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)

There are several ways to create an Azure Machine Learning service workspace.
* Option 1: Use Azure portal
    1. Sign in to the [Azure portal](https://portal.azure.com) by using the credentials for the Azure subscription you use.
    2. Select **Create a resource** menu, search for **Machine Learning service workspace**, and select **Create** button.
    3. In the **ML service workspace** pane, configure your workspace with entering the *workspace name* and *resource group* (or **create new** resource group if you don't have one already), and select **Create**. It can take a few moments to create the workspace.
    4. Download **config.json** file from the portal's AzureML workspace page and place it to `<this-notebook-folder>/aml_config/config.json`
* Option 2: Use [AzureML SDK](https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py#workspace) - Run following cell
    * To find the full list of supported region, use Azure CLI from [your machine](https://docs.microsoft.com/en-us/cli/azure/install-azure-cli?view=azure-cli-latest) or [cloud shell](https://azure.microsoft.com/en-us/features/cloud-shell/) to run: `az account list-locations`
    * To locate your tenant id, use Azure CLI to run: `az account show`

In [3]:
# AzureML workspace information. Set them to create a workspace.
SUBSCRIPTION_ID = None  #'<subscription-id>'
RESOURCE_GROUP = None   #'<resource-group>'
WORKSPACE_NAME = None   #'<workspace-name>'
LOCATION = None         #'<region-to-deploy-the-workspace>'
TENANT_ID = None        #'<tenant-id>'

# Remote compute (cluster) configuration. If you want to save the cost more, set these to small.
VM_SIZE = 'STANDARD_NC6'
VM_PRIORITY = 'lowpriority'
# Cluster nodes
MIN_NODES = 0
MAX_NODES = 8
# Hyperdrive experimentation configuration
MAX_TOTAL_RUNS = 100  # Number of runs (training-and-evaluation) to search the best hyperparameters. 
MAX_CONCURRENT_RUNS = 8

# Recommend top k items
TOP_K = 10
# Select MovieLens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'
STEPS = 50000
# Metrics to track
RANKING_METRICS = [evaluator.ndcg_at_k.__name__, evaluator.precision_at_k.__name__]
RATING_METRICS = [evaluator.rmse.__name__, evaluator.mae.__name__]
PRIMARY_METRIC = evaluator.rmse.__name__
# Data column names
USER_COL = 'UserId'
ITEM_COL = 'MovieId'
RATING_COL = 'Rating'
ITEM_FEAT_COL = 'Genres'


In [4]:
if TENANT_ID:
    auth = aml.core.authentication.InteractiveLoginAuthentication(
        tenant_id=TENANT_ID
    )
else:
    auth = None  

if SUBSCRIPTION_ID and RESOURCE_GROUP and WORKSPACE_NAME and LOCATION:
    try:
        # Try to get existing workspace by given information
        ws = aml.core.Workspace(
            workspace_name=WORKSPACE_NAME,
            subscription_id=SUBSCRIPTION_ID,
            resource_group=RESOURCE_GROUP,
        )
        print("Found existing AzureML workspace.")
    except aml.exceptions.AuthenticationException:
        # Create a new workspace
        print("Creating new AzureML workspace.")
        ws = aml.core.Workspace.create(
            name=WORKSPACE_NAME,
            subscription_id=SUBSCRIPTION_ID,
            resource_group=RESOURCE_GROUP,
            create_resource_group=True,
            location=LOCATION,
            auth=auth,
        )
    ws.write_config()
# If you are using an already-configured workspace config.json file
else:
    ws = aml.core.Workspace.from_config(auth=auth)

Found the config file in: /data/home/jumin/git/reco/notebooks/04_model_select_and_optimize/aml_config/config.json


Falling back to use azure cli credentials. This fall back to use azure cli credentials will be removed in the next release. 
Make sure your code doesn't require 'az login' to have happened before using azureml-sdk, except the case when you are specifying AzureCliAuthentication in azureml-sdk.


To verify your workspace, run:

In [5]:
print("AzureML workspace name: ", ws.name)

AzureML workspace name:  junminaml


### 2. Create Remote Compute Target

We create a GPU cluster as our **remote compute target**. If a cluster with the same name is already exist in your workspace, the script will load it instead. You can see [this document](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) to learn more about setting up a compute target on different locations.

This notebook selects **STANDARD_NC6** virtual machine (VM) and sets it's priority as *lowpriority* to save the cost.

Size | vCPU | Memory (GiB) | Temp storage (SSD, GiB) | GPU | GPU memory (GiB) | Max data disks | Max NICs
---|---|---|---|---|---|---|---
Standard_NC6 | <div align="center">6</div> | <div align="center">56</div> | <div align="center">340</div> | <div align="center">1</div> | <div align="center">8</div> | <div align="center">24</div> | <div align="center">1</div>


For more information about Azure virtual machine sizes, see [here](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu).

In [6]:
CLUSTER_NAME = 'gpu-cluster-nc6'

try:
    compute_target = aml.core.compute.ComputeTarget(workspace=ws, name=CLUSTER_NAME)
    print("Found existing compute target")
except aml.core.compute_target.ComputeTargetException:
    print("Creating a new compute target...")
    compute_config = aml.core.compute.AmlCompute.provisioning_configuration(
        vm_size=VM_SIZE,
        vm_priority=VM_PRIORITY,
        min_nodes=MIN_NODES,
        max_nodes=MAX_NODES
    )
    # create the cluster
    compute_target = aml.core.compute.ComputeTarget.create(ws, CLUSTER_NAME, compute_config)
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# Use the 'status' property to get a detailed status for the current cluster. 
print(compute_target.status.serialize())

Found existing compute target
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-06-28T16:58:16.459000+00:00', 'creationTime': '2019-06-18T21:09:39.101231+00:00', 'currentNodeCount': 0, 'errors': None, 'modifiedTime': '2019-06-18T21:09:55.347615+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 8, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 0, 'vmPriority': 'LowPriority', 'vmSize': 'STANDARD_NC6'}


### 3. Prepare Data
For demonstration purpose, we use 100k MovieLens dataset. First, download the data and convert the format (multi-hot encode *genres*) to make it work for our model. More details about this step is described in our [Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb).

In [7]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=[USER_COL, ITEM_COL, RATING_COL],
    genres_col=ITEM_FEAT_COL
)

# Encode 'genres' into int array (multi-hot representation) to use as item features
genres_encoder = sklearn.preprocessing.MultiLabelBinarizer()
data[ITEM_FEAT_COL] = genres_encoder.fit_transform(
    data[ITEM_FEAT_COL].apply(lambda s: s.split("|"))
).tolist()

data.head()

100%|██████████| 4.81k/4.81k [00:00<00:00, 18.1kKB/s]


Unnamed: 0,UserId,MovieId,Rating,Genres
0,196,242,3.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
1,63,242,3.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
2,226,242,5.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
3,154,242,3.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."
4,306,242,5.0,"[0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ..."


The dataset is split into train, validation, and test sets. The train and validation sets will be used for hyperparameter tuning, and the test set will be used for the final evaluation of the model after we import the best model from AzureML workspace.

Here, we don't use multiple-split directly by passing `ratio=[0.56, 0.19, 0.25]`. Instead, we first split the data into train and test sets with the same `seed` we've been using in other notebooks to make the train set identical across them. Then, we further split the train set into train and validation sets.

In [11]:
# Use the same seed to make the train and test sets identical across other notebooks in the repo.
train, test = python_random_split(data, ratio=0.75, seed=SEED)
# Further split the train set into train and validation set.
train, valid = python_random_split(train, seed=SEED)

print(
    "Number of samples:\n"
    "- Training   = {}\n"
    "- Validation = {}\n"
    "- Testing    = {}".format(len(train), len(valid), len(test))
)

Number of samples:
- Training   = 56250
- Validation = 18750
- Testing    = 25000


Now, upload the train and validation sets to the AzureML workspace. Our Hyperdrivce experiment will use them.

In [12]:
DATA_DIR = os.path.join(tmp_dir.name, 'aml_data') 

os.makedirs(DATA_DIR, exist_ok=True)

TRAIN_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_train.pkl"
train.to_pickle(os.path.join(DATA_DIR, TRAIN_FILE_NAME))
VALID_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_valid.pkl"
valid.to_pickle(os.path.join(DATA_DIR, VALID_FILE_NAME))

# Note, all the files under DATA_DIR will be uploaded to the data store
ds = ws.get_default_datastore()
ds.upload(
    src_dir=DATA_DIR,
    target_path='data',
    overwrite=True,
    show_progress=True
)

Uploading /tmp/tmpwby7dwh4/aml_data/movielens_100k_train.pkl
Uploading /tmp/tmpwby7dwh4/aml_data/movielens_100k_valid.pkl
Uploaded /tmp/tmpwby7dwh4/aml_data/movielens_100k_valid.pkl, 1 files out of an estimated total of 2
Uploaded /tmp/tmpwby7dwh4/aml_data/movielens_100k_train.pkl, 2 files out of an estimated total of 2


$AZUREML_DATAREFERENCE_ec1d8219afb44a36adf66ff9ece918f4

### 4. Prepare Training Scripts
Next step is to prepare scripts that AzureML Hyperdrive will use to train and evaluate models with selected hyperparameters. We re-use our [Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb) for that. To run the model notebook from the Hyperdrive Run, all we need is to prepare an [entry script](../../reco_utils/azureml/wide_deep.py) which parses the hyperparameter arguments, passes them to the notebook, and records the results of the notebook to AzureML Run logs by using `papermill`. Hyperdrive uses the logs to track the performance of each hyperparameter-set and finds the best performed one.  

Here is a code snippet from the entry script:
```
...
from azureml.core import Run
run = Run.get_context()
...
NOTEBOOK_NAME = os.path.join(
    "notebooks",
    "00_quick_start",
    "wide_deep_movielens.ipynb"
)
...
parser = argparse.ArgumentParser()
...
parser.add_argument('--dnn-optimizer', type=str, dest='dnn_optimizer', ...
parser.add_argument('--dnn-optimizer-lr', type=float, dest='dnn_optimizer_lr', ...
...
pm.execute_notebook(
    NOTEBOOK_NAME,
    OUTPUT_NOTEBOOK,
    parameters=params,
    kernel_name='python3',
)
...
```

In [13]:
# Prepare all the necessary scripts which will be loaded to our Hyperdrive Experiment Run
SCRIPT_DIR = os.path.join(tmp_dir.name, 'aml_script')

# Copy scripts to SCRIPT_DIR temporarly
shutil.copytree(os.path.join('..', '..', 'reco_utils'), os.path.join(SCRIPT_DIR, 'reco_utils'))

# We re-use our model notebook for training and testing models.
model_notebook_dir = os.path.join('notebooks', '00_quick_start')
dest_model_notebook_dir = os.path.join(SCRIPT_DIR, model_notebook_dir)
os.makedirs(dest_model_notebook_dir , exist_ok=True)
shutil.copy(
    os.path.join('..', '..', model_notebook_dir, 'wide_deep_movielens.ipynb'),
    dest_model_notebook_dir
)

# copy training scripts
shutil.copytree('train_scripts', os.path.join(SCRIPT_DIR, 'train_scripts'))

# This is our entry script for Hyperdrive Run
ENTRY_SCRIPT_NAME = 'train_scripts/wide_deep_training.py'

### 5. Setup and Run Hyperdrive Experiment

#### 5.1 Define Search Space 
We define the search space of hyperparameters. For example, if you want to test different batch sizes of {64, 128, 256}, you can use `azureml.train.hyperdrive.choice(64, 128, 256)`. To search from a continuous space, use `uniform(start, end)`. For more options, see [Hyperdrive parameter expressions](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.parameter_expressions?view=azure-ml-py).
In this notebook, we fix the number of training steps to 50000.

In the search space, we set different linear and DNN optimizers, structures, learning rates and regularization rates. Details about the hyperparameters can be found from our [Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb).

In [14]:
# Script parameters. New AzureML API only accepts string values.
script_params = {
    '--datastore': ds.as_mount(),
    '--train-datapath': "data/" + TRAIN_FILE_NAME,
    '--test-datapath': "data/" + VALID_FILE_NAME,
    '--top-k': str(TOP_K),
    '--user-col': USER_COL,
    '--item-col': ITEM_COL,
    '--item-feat-col': ITEM_FEAT_COL,
    '--rating-col': RATING_COL,
    '--ranking-metrics': RANKING_METRICS,
    '--rating-metrics': RATING_METRICS,
    '--steps': str(STEPS),
}

# Hyperparameter search space
params = {
    '--model-type': hd.choice('wide', 'deep', 'wide_deep'),
    '--batch-size': hd.choice(8, 16, 32, 64, 128),
    # Linear model hyperparameters
    '--linear-optimizer': hd.choice('adadelta', 'adagrad', 'adam', 'ftrl', 'momentum', 'sgd'),
    '--linear-optimizer-lr': hd.uniform(1e-6, 0.1),
    '--linear-l1-reg': hd.uniform(0.0, 1.0),
    '--linear-l2-reg': hd.uniform(0.0, 1.0),
    '--linear-momentum': hd.uniform(0.0, 1.0),
    # Deep model hyperparameters
    '--dnn-optimizer': hd.choice('adadelta', 'adagrad', 'adam', 'ftrl', 'momentum', 'sgd'),
    '--dnn-optimizer-lr': hd.uniform(1e-6, 0.1),
    '--dnn-l1-reg': hd.uniform(0.0, 1.0),
    '--dnn-l2-reg': hd.uniform(0.0, 1.0),
    '--dnn-momentum': hd.uniform(0.0, 1.0),
    '--dnn-user-embedding-dim': hd.choice(4, 8, 16, 32),
    '--dnn-item-embedding-dim': hd.choice(4, 8, 16, 32),
    '--dnn-hidden-layer-1': hd.choice(0, 64, 128, 256, 512),  # 0: not using this layer
    '--dnn-hidden-layer-2': hd.choice(0, 64, 128, 256, 512),
    '--dnn-hidden-layer-3': hd.choice(0, 64, 128, 256, 512),
    '--dnn-hidden-layer-4': hd.choice(64, 128, 256, 512, 1024),
    '--dnn-batch-norm': hd.choice(0, 1),
    '--dnn-dropout': hd.uniform(0.0, 0.8)
}


#### 5.2 Create Hyperdrive Experiment 
[Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters) creates a machine learning experiment [**Run**](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.run?view=azure-ml-py) on the workspace and utilizes child-runs to search the best set of hyperparameters. [Experiment](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.core.experiment(class)?view=azure-ml-py) is the main entry point into experimenting with AzureML. To create new Experiment or get the existing one, we pass our experimentation name.

**AzureML Estimator** is the building block for training. An Estimator encapsulates the training code and parameters, the compute resources and runtime environment for a particular training scenario (Note, this is not TensorFlow's Estimator). In the following cell, we create the Estimator with additional dependencies of our model scripts.

In [15]:
est = aml.train.estimator.Estimator(
    source_directory=SCRIPT_DIR,
    entry_script=ENTRY_SCRIPT_NAME,
    script_params=script_params,
    compute_target=compute_target,
    use_gpu=True,
    conda_packages=['pandas', 'scikit-learn', 'numba', 'matplotlib'],
    pip_packages=['ipykernel', 'papermill==0.18.2', 'tensorflow-gpu==1.12']
)

We set our primary metric with the goal (hyperparameter search criteria), hyperparameter sampling method, and number of total child-runs to the Hyperdrive Run Config. The bigger the search space, the more number of runs we will need for better results.

Hyperdrive provides three different parameter sampling methods: `RandomParameterSampling`, `GridParameterSampling`, and `BayesianParameterSampling`. Details about each method can be found from [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the Bayesian sampling.

In [16]:
hd_run_config = hd.HyperDriveRunConfig(
    estimator=est, 
    hyperparameter_sampling=hd.BayesianParameterSampling(params),
    primary_metric_name=PRIMARY_METRIC,
    primary_metric_goal=hd.PrimaryMetricGoal.MINIMIZE, 
    max_total_runs=MAX_TOTAL_RUNS,
    max_concurrent_runs=MAX_CONCURRENT_RUNS
)

#### 5.3 Run Experiment

Now we submit the Run to our experiment. You can see the experiment progress from this notebook by using `azureml.widgets.RunDetails(hd_run).show()` or check from the Azure portal with the url link you can get by running `hd_run.get_portal_url()`.

<img src="https://recodatasets.z20.web.core.windows.net/images/aml_0.png?sanitize=true" width="600"/>
<img src="https://recodatasets.z20.web.core.windows.net/images/aml_1.png?sanitize=true" width="600"/>
<center><i>AzureML Hyperdrive Widget</i></center>

To load an existing Hyperdrive Run instead of start new one, use `hd_run = hd.HyperDriveRun(exp, <user-run-id>, hyperdrive_run_config=hd_run_config)`. You also can cancel the Run with `hd_run.cancel()`.

In [17]:
EXP_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_wide_deep_model"
exp = aml.core.Experiment(workspace=ws, name=EXP_NAME)

In [None]:
# Create an experiment run. Skip this to load an existing run instead
hd_run = exp.submit(config=hd_run_config)

# To load an existing run: 
# hd_run = hd.HyperDriveRun(
#     experiment=exp,
#     run_id=<run-id-to-load>,
#     hyperdrive_run_config=hd_run_config
# )

hd_run.get_details()

In [20]:
# Get the list of runs from the experiment:
list(exp.get_runs())

[Run(Experiment: movielens_100k_wide_deep_model,
 Id: movielens_100k_wide_deep_model_1561733572398,
 Type: hyperdrive,
 Status: Completed), Run(Experiment: movielens_100k_wide_deep_model,
 Id: movielens_100k_wide_deep_model_1561703444608,
 Type: hyperdrive,
 Status: Canceled), Run(Experiment: movielens_100k_wide_deep_model,
 Id: movielens_100k_wide_deep_model_1560996258088,
 Type: hyperdrive,
 Status: Completed), Run(Experiment: movielens_100k_wide_deep_model,
 Id: movielens_100k_wide_deep_model_1560994940938,
 Type: hyperdrive,
 Status: Canceled), Run(Experiment: movielens_100k_wide_deep_model,
 Id: movielens_100k_wide_deep_model_1560993611286,
 Type: hyperdrive,
 Status: Canceled), Run(Experiment: movielens_100k_wide_deep_model,
 Id: movielens_100k_wide_deep_model_1560892511300,
 Type: hyperdrive,
 Status: Canceled), Run(Experiment: movielens_100k_wide_deep_model,
 Id: movielens_100k_wide_deep_model_1562013419918,
 Type: hyperdrive,
 Status: Running)]

In [None]:
# Note, widgets don't work on JupyterLab
widgets.RunDetails(hd_run).show()

Once all the child-runs are finished, we can get the best run and the metrics.
> Note, if you run Hyperdrive experiment again, you will see the best metrics and corresponding hyperparameters are not the same. It is because of 1) the random initialization of the model and 2) Hyperdrive sampling (when you use RandomSampling). You will get different results as well if you use different training and validation sets.

In [17]:
# Get best run and printout metrics
best_run = hd_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

In [18]:
print("* Best Run Id:", best_run.id)

print("\n* Best hyperparameters:")
model_type = best_run_metrics['MODEL_TYPE']
print("Model type =", model_type)
print("Batch size =", best_run_metrics['BATCH_SIZE'])

if model_type in ('wide', 'wide_deep'):
    linear_opt = best_run_metrics['LINEAR_OPTIMIZER']
    print("Linear optimizer =", linear_opt)
    print("\tLearning rate = {0:.4f}".format(best_run_metrics['LINEAR_OPTIMIZER_LR']))
    if linear_opt == 'ftrl':
        print("\tL1 regularization = {0:.4f}".format(best_run_metrics['LINEAR_L1_REG']))
        print("\tL2 regularization = {0:.4f}".format(best_run_metrics['LINEAR_L2_REG']))
    elif linear_opt == 'momentum' or linear_opt == 'rmsprop':
        print("\tMomentum = {0:.4f}".format(best_run_metrics['LINEAR_MOMENTUM']))

if model_type in ('deep', 'wide_deep'):
    dnn_opt = best_run_metrics['DNN_OPTIMIZER']
    print("DNN optimizer =", dnn_opt)
    print("\tUser embedding dimension =", best_run_metrics['DNN_USER_DIM'])
    print("\tItem embedding dimension =", best_run_metrics['DNN_ITEM_DIM'])
    print("\tHidden units =", [
        best_run_metrics['DNN_HIDDEN_LAYER_{}'.format(i)] for i in range(1, 5)
    ])
    print("\tLearning rate = {0:.4f}".format(best_run_metrics['DNN_OPTIMIZER_LR']))
    if dnn_opt == 'ftrl':
        print("\tL1 regularization = {0:.4f}".format(best_run_metrics['DNN_L1_REG']))
        print("\tL2 regularization = {0:.4f}".format(best_run_metrics['DNN_L2_REG']))
    elif dnn_opt == 'momentum' or linear_opt == 'rmsprop':
        print("\tMomentum = {0:.4f}".format(best_run_metrics['DNN_MOMENTUM']))
    print("\tDropout rate = {0:.4f}".format(best_run_metrics['DNN_DROPOUT']))
    print("\tBatch normalization =", 1==best_run_metrics['DNN_BATCH_NORM'])
    
# Metrics evaluated on validation set
print("\n* Performance metrics:")
for m in RANKING_METRICS:
    print("\t{0} (top-{1}) = {2:.4f}".format(m, TOP_K, best_run_metrics[m]))
for m in RATING_METRICS:
    print("\t{0} = {1:.4f}".format(m, best_run_metrics[m]))    

* Best Run Id: movielens_100k_wide_deep_model_1561733572398_41

* Best hyperparameters:
Model type = wide_deep
Batch size = 32.0
Linear optimizer = adagrad
	Learning rate = 0.0621
DNN optimizer = adadelta
	User embedding dimension = 32.0
	Item embedding dimension = 16.0
	Hidden units = [0.0, 64.0, 128.0, 512.0]
	Learning rate = 0.1000
	Dropout rate = 0.8000
	Batch normalization = True

* Performance metrics:
	ndcg_at_k (top-10) = 0.0555
	precision_at_k (top-10) = 0.0534
	rmse = 0.9552
	mae = 0.7568


### 6. Model Import and Test

[Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_movielens.ipynb), which we've used in our Hyperdrive Experiment, exports the trained model to the output folder (the output path is recorded at `best_run_metrics['saved_model_dir']`). We can download a model from the best run and test it. 

In [43]:
MODEL_DIR = os.path.join(tmp_dir.name, 'aml_model')
os.makedirs(MODEL_DIR, exist_ok=True)

model_file_dir = best_run_metrics['saved_model_dir'] + '/'
print(model_file_dir)

for f in best_run.get_file_names():
    if f.startswith(model_file_dir):
        output_file_path = os.path.join(MODEL_DIR, f.split(model_file_dir)[1])
        print("Downloading {}..".format(f))
        best_run.download_file(name=f, output_file_path=output_file_path)
    
saved_model = tf.contrib.estimator.SavedModelEstimator(MODEL_DIR)

outputs/model/1561737321/
Downloading outputs/model/1561737321/saved_model.pb..
Downloading outputs/model/1561737321/variables/variables.data-00000-of-00002..
Downloading outputs/model/1561737321/variables/variables.data-00001-of-00002..
Downloading outputs/model/1561737321/variables/variables.index..


In [44]:
cols = {
    'col_user': USER_COL,
    'col_item': ITEM_COL,
    'col_rating': RATING_COL,
    'col_prediction': 'prediction'
}

tf.logging.set_verbosity(tf.logging.ERROR)

In [46]:
# Rating prediction set
X_test = test.drop(RATING_COL, axis=1)
X_test.reset_index(drop=True, inplace=True)

# Rating prediction
predictions = list(itertools.islice(
    saved_model.predict(
        pandas_input_fn_for_saved_model(
            df=X_test,
            feat_name_type={
                USER_COL: int,
                ITEM_COL: int,
                ITEM_FEAT_COL: list
            }
        )
    ),
    len(X_test)
))

prediction_df = X_test.copy()
prediction_df['prediction'] = [p['outputs'][0] for p in predictions]
print(prediction_df['prediction'].describe(), "\n")
for m in RATING_METRICS:
    result = evaluator.metrics[m](test, prediction_df, **cols)
    print(m, "=", result)

count    25000.000000
mean         3.525522
std          0.635910
min          0.140751
25%          3.129608
50%          3.576132
75%          3.973043
max          5.629328
Name: prediction, dtype: float64 

rmse = 0.956280219325999
mae = 0.7553600390541554


In [32]:
# Unique items
if ITEM_FEAT_COL is None:
    items = data.drop_duplicates(ITEM_COL)[[ITEM_COL]].reset_index(drop=True)
else:
    items = data.drop_duplicates(ITEM_COL)[[ITEM_COL, ITEM_FEAT_COL]].reset_index(drop=True)
# Unique users
users = data.drop_duplicates(USER_COL)[[USER_COL]].reset_index(drop=True)

# Ranking prediction set
ranking_pool = user_item_pairs(
    user_df=users,
    item_df=items,
    user_col=USER_COL,
    item_col=ITEM_COL,
    user_item_filter_df=pd.concat([train, valid]),  # remove seen items
    shuffle=True
)

In [33]:
predictions = []
# If we put all ranking_pool into a tensor, we get error (since the content limit is 2GB).
# We divide ranking_pool into 5 chunks, make prediction, and concat the results. 
for pool in np.array_split(ranking_pool, 5):
    pool.reset_index(drop=True, inplace=True)
    # Rating prediction
    pred = list(itertools.islice(
        saved_model.predict(
            pandas_input_fn_for_saved_model(
                df=X_test,
                feat_name_type={
                    USER_COL: int,
                    ITEM_COL: int,
                    ITEM_FEAT_COL: list
                }
            )
        ),
        len(pool)
    ))
    predictions.extend([p['outputs'][0] for p in pred])
    
ranking_pool['prediction'] = predictions

for m in RANKING_METRICS:
    result = evaluator.metrics[m](test, ranking_pool, **{**cols, 'k': TOP_K})
    print(m, "=", result)

ndcg_at_k = 0.018009288572177713
precision_at_k = 0.01792152704135737


#### Wide-and-Deep Baseline Comparison
To see if Hyperdrive found good hyperparameters, we simply compare with the model with known hyperparameters from [TensorFlow's wide-deep learning example](https://github.com/tensorflow/models/blob/master/official/wide_deep/movielens_main.py) which uses only the DNN part from the wide-and-deep model for MovieLens data.

> Note, this is not 'apples to apples' comparison. For example, TensorFlow's movielens example uses *rating-timestamp* as a numeric feature, but we did not use that here because we think the timestamps are not relevant to the movies' ratings. This comparison is more like to show how Hyperdrive can help to find comparable hyperparameters without requiring exhaustive efforts in going over a huge search-space. 

In [47]:
OUTPUT_NOTEBOOK = os.path.join(tmp_dir.name, "output.ipynb")
OUTPUT_MODEL_DIR = os.path.join(tmp_dir.name, "known_hyperparam_model_checkpoints")

params = {
    'MOVIELENS_DATA_SIZE': MOVIELENS_DATA_SIZE,
    'TOP_K': TOP_K,
    'MODEL_TYPE': 'deep',
    'STEPS': STEPS,
    'BATCH_SIZE': 256,
    'DNN_OPTIMIZER': 'Adam',
    'DNN_OPTIMIZER_LR': 0.001,
    'DNN_HIDDEN_LAYER_1': 256,
    'DNN_HIDDEN_LAYER_2': 256,
    'DNN_HIDDEN_LAYER_3': 256,
    'DNN_HIDDEN_LAYER_4': 128,
    'DNN_USER_DIM': 16,
    'DNN_ITEM_DIM': 64,
    'DNN_DROPOUT': 0.3,
    'DNN_BATCH_NORM': 0,
    'MODEL_DIR': OUTPUT_MODEL_DIR,
    'EVALUATE_WHILE_TRAINING': False,
    'EXPORT_DIR_BASE': OUTPUT_MODEL_DIR,
    'RANKING_METRICS': RANKING_METRICS,
    'RATING_METRICS': RATING_METRICS,
}

with Timer() as train_time:
    pm.execute_notebook(
        "../00_quick_start/wide_deep_movielens.ipynb",
        OUTPUT_NOTEBOOK,
        parameters=params,
        kernel_name='python3'
    )
print("Training and evaluation of Wide-and-Deep model took", train_time.interval, "secs.")

nb = pm.read_notebook(OUTPUT_NOTEBOOK)
for m in RANKING_METRICS:
    print(m, "=", nb.data[m])
for m in RATING_METRICS:
    print(m, "=", nb.data[m])

HBox(children=(IntProgress(value=0, max=34), HTML(value='')))


Training and evaluation of Wide-and-Deep model took 357.3825697898865 secs.
ndcg_at_k = 0.013269362558705873
precision_at_k = 0.015482502651113467
rmse = 1.0421873135289017
mae = 0.8238318599748612




### Concluding Remark
We showed how to tune hyperparameters by utilizing Azure Machine Learning service. Complex and powerful models like Wide-and-Deep model often have many number of hyperparameters that affect on the recommendation accuracy, and it is not practical to tune the model without using a GPU cluster. For example, a training and evaluation of a model took around 3 minutes on 100k MovieLens data on a single *Standard NC6* VM as we tested from the [above cell](#Wide-and-Deep-Baseline-Comparison). When we used 1M MovieLens, it took about 47 minutes. If we want to investigate through 100 different combinations of hyperparameters **manually**, it will take **78 hours** on the VM and we may still wonder if we had tested good candidates of hyperparameters. With AzureML, as we shown in this notebook, we can easily setup different size of GPU cluster fits to our problem and utilize Bayesian sampling to navigate through the huge search space efficiently, and tweak the experiment with different criteria and algorithms for further research.

#### Cleanup

In [4]:
tmp_dir.cleanup()