<i>Copyright (c) Microsoft Corporation. All rights reserved.<br>
Licensed under the MIT License.</i>
<br><br>
# SVD Hyperparameter Tuning with Azure Machine Learning

In this notebook, we show how to tune the hyperparameters of a matrix factorization algorithm by utilizing **Azure Machine Learning service** ([AzureML](https://azure.microsoft.com/en-us/services/machine-learning-service/)) in the context of movie recommendations. To use AzureML you will need an Azure subscription. We use the SVD algorithm from the Surprise library.

We present the overall process of utilizing AML by demonstrating some key steps while avoiding too much detail. 

For more details about the **SVD** algorithm:
* [Surprise SVD deep-dive notebook](../02_model/surprise_svd_deep_dive.ipynb)
* [Original paper](http://papers.nips.cc/paper/3208-probabilistic-matrix-factorization.pdf)
* [Surprise homepage](https://surprise.readthedocs.io/en/stable/)
  
Regarding **AzureML**, please refer to:
* [Quickstart notebook](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python)
* [Hyperdrive](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters)

### Prerequisite
To run this example, you will need to install [`azureml-sdk`](https://pypi.org/project/azureml-sdk/).
If you are using a [Data Science Virtual Machine (DSVM)](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment#dsvm) or [Azure Notebook](https://notebooks.azure.com/), `azureml-sdk` is already installed.

To install AML Python SDK, run
```
pip install --upgrade azureml-sdk[notebooks]
```

More info about setting up an AML environment can be found at [this link](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-configure-environment).

### AML Workspace Configuration
AML workspace is the foundational block in the cloud that you use to experiment, train, and deploy machine learning models. We 
1. set up a workspace from Azure portal and 
2. create a config file manually. 

The instructions here are based on AML documents about [Quickstart with Azure portal](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-get-started) and [Quickstart with Python SDK](https://docs.microsoft.com/en-us/azure/machine-learning/service/quickstart-create-workspace-with-python) where you can find more details with screenshots about the setup process.
  
#### Create a workspace
1. Sign in to the [Azure portal](https://portal.azure.com) by using the credentials for the Azure subscription you use.
2. Select **Create a resource** menu, search for **Machine Learning service workspace** select **Create** button.
3. In the **ML service workspace** pane, configure your workspace by entering the *workspace name* and *resource group* (or **create new** resource group if you don't have one already), and select **Create**. It can take a few moments to create the workspace.
  
#### Make a configuration file
To configure this notebook to communicate with your workspace easily, create a *./aml_config/config.json* file with the following contents:
```
{
    "subscription_id": "<subscription-id>",
    "resource_group": "<resource-group>",
    "workspace_name": "<workspace-name>"
}
```
replacing `<subscription-id>`, `<resource-group>`, and `<workspace-name>` with the strings of your subscription id, resource group, and workspace name, respectively.

Now let's see if everything is ready!

In [33]:
import sys
sys.path.append("../../")
import time
import os
import surprise
import papermill as pm
import pandas as pd
from reco_utils.dataset import movielens
from reco_utils.dataset.python_splitters import python_random_split

print("System version: {}".format(sys.version))
print("Surprise version: {}".format(surprise.__version__))

import azureml as aml
import azureml.widgets
import azureml.train.hyperdrive as hd

print("Azure ML SDK Version:", aml.core.VERSION)

System version: 3.6.7 |Anaconda, Inc.| (default, Dec 10 2018, 20:35:02) [MSC v.1915 64 bit (AMD64)]
Surprise version: 1.0.6
Azure ML SDK Version: 1.0.10


In [34]:
# Connect to a workspace
ws = aml.core.Workspace.from_config()
print("AML workspace name: ", ws.name)

Found the config file in: C:\Users\anargyri\git\Recommenders\notebooks\04_model_select_and_optimize\aml_config\config.json
AML workspace name:  anargyri


From the following cells, we
1. Create a *remote compute target* (cpu_cluster) if it does not exist already,
2. Mount a *data store* and upload the training set, and
3. Run a hyperparameter tuning experiment.

### Create a Remote Compute Target

We create an AI Compute for our remote compute target. The script will load the cluster if it already exists. You can look at [this document](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-set-up-training-targets) to learn more about setting up a *compute target*.

> Note: we create a low priority cluster to save costs.

In [55]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpuclustersvd"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',
                                                           max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

cpu_cluster.wait_for_completion(show_output=True)

# Use the 'status' property to get a detailed status for the current cluster. 
print(cpu_cluster.status.serialize())

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished
Minimum number of nodes requested have been provisioned
{'allocationState': 'Steady', 'allocationStateTransitionTime': '2019-02-06T15:21:08.513000+00:00', 'creationTime': '2019-02-06T15:20:44.924560+00:00', 'currentNodeCount': 0, 'errors': None, 'modifiedTime': '2019-02-06T15:21:35.653142+00:00', 'nodeStateCounts': {'idleNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0, 'preparingNodeCount': 0, 'runningNodeCount': 0, 'unusableNodeCount': 0}, 'provisioningState': 'Succeeded', 'provisioningStateTransitionTime': None, 'scaleSettings': {'minNodeCount': 0, 'maxNodeCount': 4, 'nodeIdleTimeBeforeScaleDown': 'PT120S'}, 'targetNodeCount': 0, 'vmPriority': 'Dedicated', 'vmSize': 'STANDARD_D2_V2'}


Set up the configuration of the remote cluster and conda dependencies from the repository yaml file. 

In [36]:
from azureml.core.runconfig import RunConfiguration
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.runconfig import DEFAULT_CPU_IMAGE

# Create a new runconfig object
run_amlcompute = RunConfiguration()

# Use the cpu_cluster you created above. 
run_amlcompute.target = cpu_cluster

# Enable Docker
run_amlcompute.environment.docker.enabled = True

# Set Docker base image to the default CPU-based image
run_amlcompute.environment.docker.base_image = DEFAULT_CPU_IMAGE

# Use conda_dependencies.yml to create a conda environment in the Docker image for execution
run_amlcompute.environment.python.user_managed_dependencies = False

# Auto-prepare the Docker image when used for execution (if it is not already prepared)
run_amlcompute.auto_prepare_environment = True

# Specify CondaDependencies obj, add necessary packages
run_amlcompute.environment.python.conda_dependencies = CondaDependencies(conda_dependencies_file_path=
                                                                         '../../scripts/conda_bare.yaml')

### Prepare Dataset
1. Download data and split into training and testing sets
2. Upload the training set to the default **blob storage** of the workspace.

In [38]:
# Select Movielens data size: 100k, 1m, 10m, or 20m
MOVIELENS_DATA_SIZE = '100k'

In [39]:
data = movielens.load_pandas_df(
    size=MOVIELENS_DATA_SIZE,
    header=["userID", "itemID", "rating"]
)

data.head()

Unnamed: 0,userID,itemID,rating
0,196,242,3.0
1,186,302,3.0
2,22,377,1.0
3,244,51,2.0
4,166,346,1.0


In [40]:
train, test = python_random_split(data, 0.75)

In [45]:
DATA_DIR = 'aml_data'
os.makedirs(DATA_DIR, exist_ok=True)

TRAIN_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_train.pkl"
train.to_pickle(os.path.join(DATA_DIR, TRAIN_FILE_NAME))

TEST_FILE_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_test.pkl"
test.to_pickle(os.path.join(DATA_DIR, TEST_FILE_NAME))

# Note, all the files under DATA_DIR will be uploaded to the data store
ds = ws.get_default_datastore()
ds.upload(
    src_dir=DATA_DIR,
    target_path='data',
    overwrite=True,
    show_progress=True
)

Target already exists. Skipping upload for data\movielens_100k_test.pkl


Uploading aml_data\movielens_100k_train.pkl


Target already exists. Skipping upload for data\movielens_1m_train.pkl


Uploaded aml_data\movielens_100k_train.pkl, 1 files out of an estimated total of 2


$AZUREML_DATAREFERENCE_8e5a3c56aac94b62bb8a6ff8d8a82084

We also prepare a training script [svd_training.py](../../reco_utils/aml/svd_training.py) for the hyperparameter tuning, which will log our target metrics such as [RMSE](https://en.wikipedia.org/wiki/Root-mean-square_deviation) and/or [NDCG](https://en.wikipedia.org/wiki/Discounted_cumulative_gain) to AML experiment so that we can track the metrics and optimize the primary metric via **hyperdrive**.

In [115]:
SCRIPT_DIR = 'aml_script'

# Clean-up scripts if already exists
shutil.rmtree(SCRIPT_DIR, ignore_errors=True)

# Copy scripts to SCRIPT_DIR temporarly
shutil.copytree(os.path.join('..', '..', 'reco_utils'), os.path.join(SCRIPT_DIR, 'reco_utils'))

ENTRY_SCRIPT_NAME = 'reco_utils/aml/svd_training.py'

Now we define a search space for the hyperparameters. All the parameter values will be passed to our training script.

AML hyperdrive provides `RandomParameterSampling`, `GridParameterSampling`, and `BayesianParameterSampling`. Details about each approach are beyond the scope of this notebook and can be found in [Azure doc](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-tune-hyperparameters). Here, we use the Bayesian sampling.

In [126]:
EXP_NAME = "movielens_" + MOVIELENS_DATA_SIZE + "_svd_model"
PRIMARY_METRIC = 'precision@10'
METRICS = ['precision@10', 'rmse', 'ndcg@10']  
RANDOM_STATE = 0
VERBOSE = True
NUM_EPOCHS = 30
BIASED = True

script_params = {
    '--datastore': ds.as_mount(),
    '--train-datapath': "data/" + TRAIN_FILE_NAME,
    '--test-datapath': "data/" + TEST_FILE_NAME,
    '--surprise-reader': 'ml-100k',
    '--metrics': METRICS,
    '--random-state': RANDOM_STATE,
    '--epochs': NUM_EPOCHS,
}

if BIASED:
    script_params['--biased'] = ''
if VERBOSE:
    script_params['--verbose'] = ''

# hyperparameters search space
# We do not set 'lr_all' and 'reg_all' because they will be overriden by the other lr_ and reg_ parameters

hyper_params = {
    'n_factors': hd.choice(10, 50, 100, 150, 200),
    'init_mean': hd.uniform(-0.5, 0.5),
    'init_std_dev': hd.uniform(0.01, 0.2),
    'lr_bu': hd.uniform(1e-6, 0.1), 
    'lr_bi': hd.uniform(1e-6, 0.1), 
    'lr_pu': hd.uniform(1e-6, 0.1), 
    'lr_qi': hd.uniform(1e-6, 0.1), 
    'reg_bu': hd.uniform(1e-6, 1),
    'reg_bi': hd.uniform(1e-6, 1), 
    'reg_pu': hd.uniform(1e-6, 1), 
    'reg_qi': hd.uniform(1e-6, 1)
}

# Note, BayesianParameterSampling only support choice, uniform, and quniform
ps = hd.BayesianParameterSampling(hyper_params)

Once you submit the experiment, you can see the progress from the notebook by using `azureml.widgets.RunDetails`. You can directly check the details from the Azure portal as well. To get the link, run `run.get_portal_url()`.

For RandomSampling, you can use early termnination policy
```
policy = hd.BanditPolicy(evaluation_interval=1, slack_factor=0.1, delay_evaluation=3)
```

> Since we will do hyperparameter tuning, we create a `HyperDriveRunConfig` and pass it to the experiment object. If you already know what hyperparameters to use and still want to utilize AML for other purposes (e.g. model management), you can set the hyperparameter values directly to `script_params` and run the experiment, `run = exp.submit(est)`, instead.  

In [127]:
est = azureml.train.estimator.Estimator(
    source_directory=SCRIPT_DIR,
    entry_script=ENTRY_SCRIPT_NAME,
    script_params=script_params,
    compute_target=cpu_cluster,
    conda_packages=['pandas'],
    pip_packages=['scikit-learn', 'scikit-surprise']
)

hd_config = hd.HyperDriveRunConfig(
    estimator=est, 
    hyperparameter_sampling=ps,
    primary_metric_name=PRIMARY_METRIC,
    primary_metric_goal=hd.PrimaryMetricGoal.MAXIMIZE, 
    max_total_runs=100,
    max_concurrent_runs=8
)

In [128]:
# Create an experiment to track the runs in the workspace
exp = aml.core.Experiment(workspace=ws, name=EXP_NAME)
run = exp.submit(config=hd_config)

azureml.widgets.RunDetails(run).show()
run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: movielens_100k_svd_model_1550082199576
Performing interactive authentication. Please follow the instructions on the terminal.


TypeError: unsupported operand type(s) for -=: 'Retry' and 'int'

Note, we have launched a browser for you to login. For old experience with device code, use "az login --use-device-code"
You have logged in. Now let us find all the subscriptions to which you have access...


Interactive authentication successfully completed.


In [133]:
# Get best run and printout metrics
best_run = run.get_best_run_by_primary_metric()

best_run_metrics = best_run.get_metrics()
parameter_values = best_run.get_details()['runDefinition']['Arguments']

In [136]:
best_run_metrics

{'Number of epochs': 30,
 'rmse': 1.006665487997617,
 'ndcg@10': 0.13216803529015922,
 'precision@10': 0.12322375397667021}

In [134]:
print(parameter_values)

['--datastore', '$AZUREML_DATAREFERENCE_workspaceblobstore', '--train-datapath', 'data/movielens_100k_train.pkl', '--test-datapath', 'data/movielens_100k_test.pkl', '--surprise-reader', 'ml-100k', '--metrics', 'precision@10', 'rmse', 'ndcg@10', '--random-state', '0', '--epochs', '30', '--biased', '--verbose', '--n_factors', '10', '--init_mean', '0.0347687232816559', '--init_std_dev', '0.161076796256845', '--lr_bu', '0.0428243511820616', '--lr_bi', '0.000104948235321715', '--lr_pu', '0.00917009869249142', '--lr_qi', '0.052073456117451', '--reg_bu', '0.352419183435925', '--reg_bi', '0.817700085250959', '--reg_pu', '0.135437072159942', '--reg_qi', '0.83400014088431']


In [135]:
try:
    shutil.rmtree(SCRIPT_DIR)
    shutil.rmtree(DATA_DIR)
except (PermissionError, FileNotFoundError):
    pass

### References

https://github.com/MtDersvan/tf_playground/blob/master/wide_and_deep_tutorial/wide_and_deep_export_r1.3.ipynb

* [Fine-tune natural language processing models using Azure Machine Learning service](https://azure.microsoft.com/en-us/blog/fine-tune-natural-language-processing-models-using-azure-machine-learning-service/)
* [Training, hyperparameter tune, and deploy with TensorFlow](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-tensorflow/train-hyperparameter-tune-deploy-with-tensorflow.ipynb)
