# Hyperparameter Tuning using HyperDrive

## Azure ML imports

In [1]:
import logging
import os
import csv
import pkg_resources
import joblib

import numpy as np
import pandas as pd

from matplotlib import pyplot as plt

from sklearn import datasets
from sklearn.metrics import confusion_matrix

import azureml.core
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.hyperdrive.parameter_expressions import uniform, choice
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.sklearn import SKLearn
from azureml.widgets import RunDetails

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.27.0


## Initialize workspace

Initialize a workspace object from persisted configuration.

In [2]:
ws = Workspace.from_config()

print(
    'Workspace name: ' + ws.name, 
    'Azure region: ' + ws.location, 
    'Subscription id: ' + ws.subscription_id, 
    'Resource group: ' + ws.resource_group, sep = '\n'
    )

Workspace name: udacity-ml-capstone-ws
Azure region: eastus
Subscription id: b329467a-d1f8-4c9b-b3dc-95cdc7bff7fa
Resource group: udacity-ml-capstone-rg


## Create an Azure HyperDrive experiment

Let's create an experiment named `heart-failure-hd-exp` and a folder to hold the training scripts. The script runs will be recorded under the experiment in Azure.

The best practice is to use separate folders for scripts and its dependent files for each step and specify that folder as the `source_directory for the step. This helps reduce the size of the snapshot created for the step (only the specific folder is snapshotted). Since changes in any files in the source_directory would trigger a re-upload of the snapshot, this helps keep the reuse of the step when there are no changes in the source_directory of the step.

In [3]:
# Choose a name for the run history container in the workspace
experiment_name = 'heart-failure-hd-exp'
project_folder = './heart-failure-hd-proj'

experiment = Experiment(ws, experiment_name)
experiment

Name,Workspace,Report Page,Docs Page
heart-failure-hd-exp,udacity-ml-capstone-ws,Link to Azure Machine Learning studio,Link to Documentation


### Create or attach an AmlCompute cluster

You will need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/service/concept-azure-machine-learning-architecture#compute-target) for your HyperDrive run.

In [4]:
# Choose a name for your CPU cluster
compute_cluster_name = "compute-cluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=compute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(
        vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
        #vm_priority = 'lowpriority', # optional
        min_nodes=0,
        max_nodes=5)
    compute_target = ComputeTarget.create(ws, compute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

# For a more detailed view of current AmlCompute status, use get_status()
print(compute_target.get_status().serialize())

Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned
{'currentNodeCount': 4, 'targetNodeCount': 4, 'nodeStateCounts': {'preparingNodeCount': 0, 'runningNodeCount': 0, 'idleNodeCount': 4, 'unusableNodeCount': 0, 'leavingNodeCount': 0, 'preemptedNodeCount': 0}, 'allocationState': 'Steady', 'allocationStateTransitionTime': '2021-05-11T16:21:15.704000+00:00', 'errors': [{'error': {'code': 'ClusterCoreQuotaReached', 'message': 'Operation results in exceeding quota limits of Total Cluster Dedicated Regional vCPUs. Maximum allowed: 10, Current in use: 10, Additional requested: 2. Click here to view and request for quota: https://portal.azure.com/#resource/subscriptions/b329467a-d1f8-4c9b-b3dc-95cdc7bff7fa/resourceGroups/udacity-ml-capstone-rg/providers/Microsoft.MachineLearningServices/workspaces/udacity-ml-capstone-ws/quotaUsage'}}], 'creationTime': '2021-05-11T12:45:11.596800+00:00', 'modifiedTime': '2021-

## Dataset

The data is loaded into the workspace using `TabularDataFactory` in the `train.py` script.

## Hyperdrive Configuration

For the HyperDrive experiment, we chose the [`LogisticRegression`](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) classifier from scikit-learn.

The script `train.py` takes care of data collection, cleansing and splitting, model training and testing. Hyperparameter sampling and applying the early stopping policy is performed by HyperDrive.

### Data collection, cleansing and splitting

The dataset is loaded using `TabularDatasetFactory`. The cleansing process drops rows with empty values and performs one hot encoding for categorical columns (our dataset does not have any). The dataset is split into train and test sets. 70% of the data is used for training and 30% for testing.

### Hyperparameter sampling

The project uses two hyperparameters:

- `--C`: inverse regularization strength
- `--max_iter`: maximum iteration to converge for the scikit-learn Logistic Regression model

I use [random parameter sampling](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.randomparametersampling?view=azure-ml-py). Random sampling supports discrete and continuous hyperparameters. It supports early termination of low-performance runs. In random sampling, hyperparameter values are randomly selected from the defined search space. Random parameter sampling is good approach for discovery learning as well as hyperparameter combinations.

### Model training and testing

Model training and testing is performed using scikit-learn's Logistical Regression model. In `train.py`, metrics are generated and logged. The accuracy is used to benchmark the model.

### Applying early stopping policy

The execution of the pipeline is stopped if the conditions specified by the policy are met.

The model uses [BanditPolicy](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.banditpolicy?view=azure-ml-py).

Bandit policy is based on slack factor/slack amount and evaluation interval. Bandit ends runs when the primary metric isn't within the specified slack factor/slack amount of the most successful run.

See [HyperDriveConfig Class](https://docs.microsoft.com/en-us/python/api/azureml-train-core/azureml.train.hyperdrive.hyperdriveconfig?view=azure-ml-py) for a complete list of configuration parameters.

In [5]:
# Early termination policy (not required if using Bayesian sampling)
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

# Params that you will be using during training
param_sampling = RandomParameterSampling({
    "--C": uniform(0.001, 100),
    "--max_iter": choice(10, 50, 100, 150, 200)
    })

# Training directory and script
train_dir = "./training"
train_script = "train.py"

# SKLearn estimator for use with train.py
estimator = SKLearn(
    source_directory=train_dir,
    entry_script=train_script,
    compute_target=compute_cluster_name
    )

# HyperDriveConfig using the estimator, hyperparameter sampler, and policy
hyperdrive_run_config = HyperDriveConfig(
    estimator=estimator,
    hyperparameter_sampling=param_sampling,
    primary_metric_name='Accuracy',
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=25,
    max_concurrent_runs=5,
    policy=early_termination_policy,
    )

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.
'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


In [6]:
# Submit your experiment
hyperdrive_run = experiment.submit(hyperdrive_run_config)



## Run Details

Use the `RunDetails` widget to show the different experiments.

In [7]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)
hyperdrive_run

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_7e911c8a-38bc-425f-a3d8-1271705b71b1
Web View: https://ml.azure.com/runs/HD_7e911c8a-38bc-425f-a3d8-1271705b71b1?wsid=/subscriptions/b329467a-d1f8-4c9b-b3dc-95cdc7bff7fa/resourcegroups/udacity-ml-capstone-rg/workspaces/udacity-ml-capstone-ws&tid=d2e5496e-227d-4db9-89b8-356c2cf9a452

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-05-11T16:45:19.934127][API][INFO]Experiment created<END>\n""<START>[2021-05-11T16:45:20.431532][GENERATOR][INFO]Trying to sample '5' jobs from the hyperparameter space<END>\n""<START>[2021-05-11T16:45:20.657480][GENERATOR][INFO]Successfully sampled '5' jobs, they will soon be submitted to the execution target.<END>\n"

Execution Summary
RunId: HD_7e911c8a-38bc-425f-a3d8-1271705b71b1
Web View: https://ml.azure.com/runs/HD_7e911c8a-38bc-425f-a3d8-1271705b71b1?wsid=/subscriptions/b329467a-d1f8-4c9b-b3dc-95cdc7bff7fa/resourcegroups/udacity-ml-capstone-rg/workspaces/udacity-ml-capstone-ws&tid=d2e5496e-227d-4db9-89b8-356c2cf9a452



Experiment,Id,Type,Status,Details Page,Docs Page
heart-failure-hd-exp,HD_7e911c8a-38bc-425f-a3d8-1271705b71b1,hyperdrive,Completed,Link to Azure Machine Learning studio,Link to Documentation


## Best Model

Get the best model from the hyperdrive experiments and display all the properties of the model.

In [8]:
# Get your best run
best_run = hyperdrive_run.get_best_run_by_primary_metric()
print(f"Best run arguments: {best_run.get_details()['runDefinition']['arguments']}")
print(f"Best run metrics: {best_run.get_metrics()}")
print(f"Best run file names: {best_run.get_file_names()}")

Best run arguments: ['--C', '79.30667986749886', '--max_iter', '200']
Best run metrics: {'Regularization strength:': 79.30667986749886, 'Max iterations:': 200, 'Accuracy': 0.7888888888888889}
Best run file names: ['azureml-logs/55_azureml-execution-tvmps_4b70af40883fc62063eb938ea2e8be4f21da36cb96b346c2e6e753bb5cd10312_d.txt', 'azureml-logs/65_job_prep-tvmps_4b70af40883fc62063eb938ea2e8be4f21da36cb96b346c2e6e753bb5cd10312_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_4b70af40883fc62063eb938ea2e8be4f21da36cb96b346c2e6e753bb5cd10312_d.txt', 'azureml-logs/process_info.json', 'azureml-logs/process_status.json', 'logs/azureml/111_azureml.log', 'logs/azureml/dataprep/backgroundProcess.log', 'logs/azureml/dataprep/backgroundProcess_Telemetry.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log']


In [9]:
# Save the best model
joblib.dump(value=best_run.id, filename="./outputs/hyperdrive_model.joblib")

['./outputs/hyperdrive_model.joblib']

## Model Deployment

As AutoML produced the better model, the HyperDrive model will not be deployed.