<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.</i>
<br><br>
# Recommender Hyperparameter Tuning w/ AzureML

In this notebook, we show how to hyperparameter tune a recommender model by utilizing **Azure Machine Learning service*** ([AML or AzureML](https://azure.microsoft.com/en-us/services/machine-learning-service/)) in the context of movie recommendation. Note, to use AML, you will need Azure subscription.

Here, we use [**wide-and-deep model**](https://ai.googleblog.com/2016/06/wide-deep-learning-better-together-with.html) from TensorFlow high-level Estimator API.

We present an overall process of utilizing AML by demonstrating some key steps while avoiding showing too much details. This notebook includes many useful links for those details instead.
  
<br>  

For more details about the **wide-and-deep** model:
* [Wide-Deep Quickstart notebook](../00_quick_start/wide_deep_model_movielens.ipynb)
* [Original paper](https://arxiv.org/abs/1606.07792)
* [TensorFlow API doc](https://www.tensorflow.org/api_docs/python/tf/estimator/DNNLinearCombinedRegressor)
  
Regarding **AuzreML**, please refer:
* [Quickstart notebook](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
* [Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters)
* [Tensorflow model tuning with hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-tensorflow)

> \* When you web-search "Azure Machine Learning", you will most likely to see mixed results of Azure Machine Learning (we call it AML) and Azure Machine Learning **Studio**. Please note they are different services where AML's focuses are on ML model management, tracking and hyperparameter tuning, while the [ML Studio](https://studio.azureml.net/)'s is to provide a high-level tool for 'easy-to-use' experience of ML designing and experimentation based on GUI.     

### Prerequisite
To run this example, you'll need to install [`azureml-sdk`](https://pypi.org/project/azureml-sdk/).
If you are using a [Data Science Virtual Machine (DSVM)](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#dsvm) or [Azure Notebook](https://notebooks.azure.com/), `azureml-sdk` is already installed in it so you don't need to install the package.

To install AML Python SDK*, run
```
pip install --upgrade azureml-sdk[notebooks]
```

More info about setting up AML environment can be found from [this link](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment).

> \* AML has a Databricks sdk `azureml-sdk[databricks]` but it doesn't support hyperparameter tuning on Databricks for now.

### AML Workspace Configuration
AML workspace is the foundational block in the cloud that you use to experiment, train, and deploy machine learning models. We 1) setup a workspace from Azure portal and 2) create a config file manually. The instructions here are based on AML documents about [Quickstart with Azure portal](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) and [Quickstart with Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python) where you can find more details about the setup process with screen-shots.
  
<br>
  
#### Create a workspace
1. Sign in to the [Azure portal](https://portal.azure.com) by using the credentials for the Azure subscription you use.
2. Select **Create a resource** menu, search for **Machine Learning service workspace** select **Create** button.
3. In the **ML service workspace** pane, configure your workspace with entering the *workspace name* and *resource group* (or **create new** resource group if you don't have one already), and select **Create**. It can take a few moments to create the workspace.
  
<br>
  
#### Make a configuration file
To configure this notebook to communicate with your workspace easily, create a *.\aml_config\config.json* file with the following contents:
```
{
    "subscription_id": "<subscription-id>",
    "resource_group": "<resource-group>",
    "workspace_name": "<workspace-name>"
}
```
replacing `<subscription-id>`, `<resource-group>`, and `<workspace-name>` with the strings of your subscription id, resource group, and workspace name, respectively.

Now let's see if everything is ready!

In [1]:
import sys
sys.path.append("../../")

import os
import shutil
import itertools

import pandas as pd
import sklearn.preprocessing

import azureml as aml
import azureml.widgets
import azureml.train.dnn
import azureml.train.hyperdrive as hd

from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_random_split

print("Azure ML SDK Version:", aml.core.VERSION)

Azure ML SDK Version: 1.0.2


In [2]:
# Connect to a workspace
ws = aml.core.Workspace.from_config()
print("AML workspace name: ", ws.name)

Found the config file in: C:\Users\jumin\git\Recommenders\notebooks\04_model_select_and_optimize\aml_config\config.json
AML workspace name:  junmin-aml-workspace


From the following cells, we
1. Create a *remote compute target* (gpu-cluster) if it does not exist already,
2. Mount a *data store* and upload the training set, and
3. Run a hyperparameter tuning experiment.

### Create a Remote Compute Target

We create a gpu cluster for our remote compute target. The script will load the cluster if it already exists. You can see [this document](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) to learn more about setting up a *compute target*.

> Note, we create low priority cluster to save the cost.

In [3]:
CLUSTER_NAME = 'gpu-cluster-16'

try:
    compute_target = aml.core.compute.ComputeTarget(workspace=ws, name=CLUSTER_NAME)
    print("Found existing compute target")
except aml.core.compute_target.ComputeTargetException:
    print("Creating a new compute target...")
    compute_config = aml.core.compute.AmlCompute.provisioning_configuration(
        vm_size='STANDARD_NC6',
        vm_priority='lowpriority',
        min_nodes=4,
        max_nodes=16
    )
    # create the cluster
    compute_target = aml.core.compute.ComputeTarget.create(ws, CLUSTER_NAME, compute_config)
    compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)

# Use the 'status' property to get a detailed status for the current cluster. 
print(compute_target.status.serialize())

Found existing compute target
{'allocationState': 'Resizing', 'allocationStateTransitionTime': '2019-01-25T01:43:31.874000+00:00', 'creationTime': '2019-01-25T01:07:26.028313+00:00', 'currentNodeCount': 5, 'errors': None, 'modifiedTime': '2019-01-25T01:07:44.576905+00:00', 'nodeStateCounts': {'idleNodeCount': 4, 'leavingNodeCount': 1, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 4, 'maxNodeCount': 16, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 4, 'vmPriority': 'LowPriority', 'vmSize': 'STANDARD_NC6'}


### Prepare Dataset
1. Download data and split into training, evaluation, and testing sets
2. Upload training and evaluation sets to the workspace's default **blob storage**

In [4]:
# Recommend top k items
TOP_K = 10

# Select Movielens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '1m'

USER_COL = 'UserId'
ITEM_COL = 'MovieId'
RATING_COL = 'Rating'
ITEM_FEAT_COL = 'Genres'

In [5]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=[USER_COL, ITEM_COL, RATING_COL],
    genres_col='Genres_string'
)
data.head()

Unnamed: 0,UserId,MovieId,Rating,Genres_string
0,1,1193,5.0,Drama
1,2,1193,5.0,Drama
2,12,1193,4.0,Drama
3,15,1193,4.0,Drama
4,17,1193,5.0,Drama


In [6]:
# Encode 'genres' into int array (multi-hot representation) to use as item features
genres_encoder = sklearn.preprocessing.MultiLabelBinarizer()
data[ITEM_FEAT_COL] = genres_encoder.fit_transform(
    data['Genres_string'].apply(lambda s: s.split("|"))
).tolist()
print("Genres:", genres_encoder.classes_)

Genres: ['Action' 'Adventure' 'Animation' "Children's" 'Comedy' 'Crime'
 'Documentary' 'Drama' 'Fantasy' 'Film-Noir' 'Horror' 'Musical' 'Mystery'
 'Romance' 'Sci-Fi' 'Thriller' 'War' 'Western']


In [7]:
# Evaluation set for the hyper-parameter tuning should be separated from the test set.
# In this example, we don't test the model.
train, _ = python_random_split(
    data.drop('Genres_string', axis=1),
    ratio=0.75,
    seed=123
)

In [8]:
DATA_DIR = 'aml_data'
os.makedirs(DATA_DIR, exist_ok=True)

TRAIN_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_train.pkl"
train.to_pickle(os.path.join(DATA_DIR, TRAIN_FILE_NAME))

# Note, all the files under DATA_DIR will be uploaded to the data store
ds = ws.get_default_datastore()
ds.upload(
    src_dir=DATA_DIR,
    target_path='data',
    overwrite=True,
    show_progress=True
)

$AZUREML_DATAREFERENCE_fa74a8677e8446998d53dc2fe855cb4a

We also prepare a training script [wide_deep_training.py](../../reco_utils/aml/wide_deep_training.py) for the hyperparameter tuning, which will log our target metrics such as [RMSE](https://en.wikipedia.org/wiki/Root-mean-square_deviation) and/or [NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) to AML experiment so that we can track the metrics and optimize the primary metric via **hyperdrive**.

In [19]:
SCRIPT_DIR = 'aml_script'
dest_dir = os.path.join(SCRIPT_DIR, 'reco_utils')
try:
    shutil.copytree(os.path.join('..', '..', 'reco_utils'), dest_dir)
except FileExistsError:
    pass

ENTRY_SCRIPT_NAME = 'reco_utils/aml/wide_deep_training.py'

Now we define a search space for the hyperparameters. All the parameter values will be passed to our training script.

AML hyperdrive provides `RandomParameterSampling`, `GridParameterSampling`, and `BayesianParameterSampling`. Details about each approach are beyond the scope of this notebook and you can find them from [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the Bayesian sampling.

In [20]:
EXP_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_wide_deep_model"
METRICS = ['mae', 'ndcg@10']  # Put your primary metric at the first place
NUM_EPOCHS = int(20000000 / len(train))

script_params = {
    '--datastore': ds.as_mount(),
    '--train-datapath': "data/" + TRAIN_FILE_NAME,
    '--user-col': USER_COL,
    '--item-col': ITEM_COL,
    '--item-feat-col': ITEM_FEAT_COL,
    '--rating-col': RATING_COL,
    '--metrics': METRICS,
    '--model-type': 'wide_deep',
    '--epochs': NUM_EPOCHS,
}

# hyperparameters search space
hyper_params = {
    '--batch-size': hd.choice(32, 64, 128, 256),
    # Wide model hyperparameters
    '--linear-optimizer': hd.choice('Ftrl', 'SGD'),
    '--linear-optimizer-lr': hd.uniform(0.0005, 0.1),
    '--l1-reg': hd.uniform(0.0, 0.1),
    # Deep model hyperparameters
    '--dnn-optimizer': hd.choice('Adagrad', 'Adam'),
    '--dnn-optimizer-lr': hd.uniform(0.0005, 0.1),
    '--dnn-user-embedding-dim': hd.choice(8, 32, 128),
    '--dnn-item-embedding-dim': hd.choice(4, 16, 64),
    '--dnn-hidden-layer-1': hd.choice(0, 32, 64, 128, 256, 512, 1024),
    '--dnn-hidden-layer-2': hd.choice(0, 32, 64, 128, 256, 512, 1024),
    '--dnn-hidden-layer-3': hd.choice(0, 32, 64, 128, 256, 512, 1024),
    '--dnn-hidden-layer-4': hd.choice(32, 64, 128, 256, 512, 1024),
    '--dnn-batch-norm': hd.choice(0, 1),  # False or True. Bayesian sampling only accept int, str or float
    '--dropout': hd.uniform(0.0, 0.5),
}

# Note, BayesianParameterSampling only support choice, uniform, and quniform
ps = hd.BayesianParameterSampling(hyper_params)

We use `azureml.train.dnn.TensorFlow`, a custom AML `Estimator` class which utilizes a preset docker image in the cluster (see more information from [here](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-train-tensorflow)).

Once you submit the experiment, you can see the progress from the notebook by using `azureml.widgets.RunDetails`. You can directly check the details from the Azure portal as well. To get the link, run `run.get_portal_url()`.

For RandomSampling, you can use early termnination policy
```
policy = hd.BanditPolicy(evaluation_interval=1, slack_factor=0.1, delay_evaluation=3)
```

> Since we will do hyperparameter tuning, we create a `HyperDriveRunConfig` and pass it to the experiment object. If you already know what hyperparameters to use and still want to utilize AML for other purposes (e.g. model management), you can set the hyperparameter values directly to `script_params` and run the experiment, `run = exp.submit(est)`, instead.  

In [21]:
est = azureml.train.dnn.TensorFlow(
    source_directory=SCRIPT_DIR,
    entry_script=ENTRY_SCRIPT_NAME,
    script_params=script_params,
    compute_target=compute_target,
    use_gpu=True,
    conda_packages=['pandas', 'scikit-learn'],
)

hd_config = hd.HyperDriveRunConfig(
    estimator=est, 
    hyperparameter_sampling=ps,
    primary_metric_name=METRICS[0],
    primary_metric_goal=hd.PrimaryMetricGoal.MINIMIZE, 
    max_total_runs=100,
    max_concurrent_runs=8
)

# Create an experiment to track the runs in the workspace
exp = aml.core.Experiment(workspace=ws, name=EXP_NAME)
run = exp.submit(config=hd_config)

In [24]:
azureml.widgets.RunDetails(run).show()
run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'NOTSE…

In [38]:
# Get best run and printout metrics
best_run = run.get_best_run_by_primary_metric()

best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['Arguments']

In [40]:
print(parameter_values)

['--datastore', '$AZUREML_DATAREFERENCE_workspaceblobstore', '--train-datapath', 'data/movielens_1m_train.pkl', '--user-col', 'UserId', '--item-col', 'MovieId', '--item-feat-col', 'Genres', '--rating-col', 'Rating', '--metrics', 'mae', 'ndcg@10', '--model-type', 'wide_deep', '--epochs', '50', '--batch-size', '32', '--linear-optimizer', 'SGD', '--linear-optimizer-lr', '0.0355815733156391', '--l1-reg', '0.0365637215217751', '--dnn-optimizer', 'Adagrad', '--dnn-optimizer-lr', '0.0646799132499025', '--dnn-user-embedding-dim', '128', '--dnn-item-embedding-dim', '64', '--dnn-hidden-layer-1', '256', '--dnn-hidden-layer-2', '0', '--dnn-hidden-layer-3', '32', '--dnn-hidden-layer-4', '256', '--dnn-batch-norm', '0', '--dropout', '0.159306271639725']


In [41]:
try:
    shutil.rmtree(SCRIPT_DIR)
    shutil.rmtree(DATA_DIR)
except (PermissionError, FileNotFoundError):
    pass

### References

https://github.com/MtDersvan/tf_playground/blob/master/wide_and_deep_tutorial/wide_and_deep_export_r1.3.ipynb

* [Fine-tune natural language processing models using Azure Machine Learning service](https://azure.microsoft.com/en-us/blog/fine-tune-natural-language-processing-models-using-azure-machine-learning-service/)
* [Training, hyperparameter tune, and deploy with TensorFlow](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb)
