# Hyperparameter Tuning using HyperDrive

Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [None]:
from azureml.core import Workspace, Experiment
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice
from azureml.train.hyperdrive.parameter_expressions import uniform
from azureml.core import ScriptRunConfig 
import os

## Dataset

Get data. In the cell below, write code to access the data you will be using in this project. For this project, the dataset chosen is the [*Heart Disease UCI*](https://github.com/yashasvisingh14/MachineLearningEngineerWithMicrosoftAzure03/blob/main/heart.csv) from Kaggle. This database contains 14 columns. The "target" field refers to the presence of heart disease in the patient (0 or 1).

Attribute Information -


1.   age
2.   sex
3.   chest pain type (4 values)
4.   resting blood pressure
5.   serum cholestoral in mg/dl
6.   fasting blood sugar > 120 mg/dl
7.   resting electrocardiographic results (values 0,1,2)
8.   maximum heart rate achieved
9.   exercise induced angina
10.  oldpeak = ST depression induced by exercise relative to rest
11.  the slope of the peak exercise ST segment
12.  number of major vessels (0-3) colored by flourosopy
13.  thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14.  target






























In [None]:
ws = Workspace.from_config()
experiment_name = 'Heart_Hyperdrive'

experiment=Experiment(ws, experiment_name)

print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')

run = experiment.start_logging()

Workspace name: quick-starts-ws-139669
Azure region: southcentralus
Subscription id: cdbe0b43-92a0-4715-838a-f2648cc7ad21
Resource group: aml-quickstarts-139669


In [None]:
# Creating compute for running HyperDrive

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

cpu_cluster_name = "cpu-cluster"
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
cpu_cluster.wait_for_completion(show_output=True)

Creating
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


## Hyperdrive Configuration

Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In HyperDrive, we control the model training process by adjusting parameters and finding the configuration of hyperparameters results in the best performance. It uses a fixed machine learning algorithm that is provided.


1.   Created compute cluster using vm_size of "Standard_D2_V2" in provisioning configuration and max_nodes of 4.
2.   Specified a parameter sampler i.e RandomParameterSampling, since randomly selects both discrete and continuous hyperparameter values. The benefit of using Random Sampling is that it supports early termination of low peformance runs.
3.   Specified a policy early stopping policy i.e Bandit Policy, it helps to automatically terminate poorly performing runs based on slack factor.It improves computational efficiency. The benefit is that policy early terminates any runs where the primary metric is not within the specified slack factor with respect to best performing training run.
4.  Created a SKLearn estimator for use with train.py.
est = SKLearn(source_directory = "./", compute_target=cpu_cluster, vm_size='STANDARD_D2_V2', entry_script="train.py")
5.  Created a HyperDriveConfig using the estimator, hyperparameter sampler, and policy with max_total_runs=20 and max_concurrent_runs=4.Used get_best_run_by_primary_metric() method of the run to select best hyperparameters.
hyperdrive_config = HyperDriveConfig(estimator=est, hyperparameter_sampling=ps, policy=policy, primary_metric_name='Accuracy', primary_metric_goal=PrimaryMetricGoal.MAXIMIZE, max_total_runs=20, max_concurrent_runs=4)
6.  Accuracy Achieved = 0.85714





In [None]:
# Early termination policy. 
early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

# Create the different params to be used during training
ps = RandomParameterSampling(
    {
        '--C': uniform(0.0, 1.0), 
        '--max_iter': choice(50, 100, 150, 200, 250)
    }
)

if "training" not in os.listdir():
    os.mkdir("./training")

# Estimator and hyperdrive config
estimator = SKLearn(source_directory=os.path.join('./'), entry_script='train.py', compute_target=cpu_cluster)

hyperdrive_run_config = HyperDriveConfig(
    estimator=estimator, 
    hyperparameter_sampling=ps, 
    policy=early_termination_policy, 
    primary_metric_name='Accuracy', 
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=20,
    max_concurrent_runs=4)

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.


In [None]:
#TODO: Submit your experiment
hyperdrive_run = experiment.submit(hyperdrive_run_config)



## Run Details
In the cell below, use the `RunDetails` widget to show the different experiments.

In [None]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'â€¦

RunId: HD_ae4b7302-9c23-4b24-aff8-c9b0db3b7fb2
Web View: https://ml.azure.com/experiments/Heart_Hyperdrive/runs/HD_ae4b7302-9c23-4b24-aff8-c9b0db3b7fb2?wsid=/subscriptions/cdbe0b43-92a0-4715-838a-f2648cc7ad21/resourcegroups/aml-quickstarts-139669/workspaces/quick-starts-ws-139669

Streaming azureml-logs/hyperdrive.txt

"<START>[2021-03-01T14:03:30.469434][API][INFO]Experiment created<END>\n""<START>[2021-03-01T14:03:31.249206][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space<END>\n""<START>[2021-03-01T14:03:31.384295][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.<END>\n"<START>[2021-03-01T14:03:31.8025120Z][SCHEDULER][INFO]The execution environment is being prepared. Please be patient as it can take a few minutes.<END>

Execution Summary
RunId: HD_ae4b7302-9c23-4b24-aff8-c9b0db3b7fb2
Web View: https://ml.azure.com/experiments/Heart_Hyperdrive/runs/HD_ae4b7302-9c23-4b24-aff8-c9b0db3b7fb2?wsid=/subscriptions/cdb

{'runId': 'HD_ae4b7302-9c23-4b24-aff8-c9b0db3b7fb2',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-03-01T14:03:30.283177Z',
 'endTimeUtc': '2021-03-01T14:18:02.915568Z',
 'properties': {'primary_metric_config': '{"name": "Accuracy", "goal": "maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': 'e157c5ed-2b35-42e6-a76a-b955abe7d7f9',
  'score': '0.8571428571428571',
  'best_child_run_id': 'HD_ae4b7302-9c23-4b24-aff8-c9b0db3b7fb2_8',
  'best_metric_status': 'Succeeded'},
 'inputDatasets': [],
 'outputDatasets': [],
 'logFiles': {'azureml-logs/hyperdrive.txt': 'https://mlstrg139669.blob.core.windows.net/azureml/ExperimentRun/dcid.HD_ae4b7302-9c23-4b24-aff8-c9b0db3b7fb2/azureml-logs/hyperdrive.txt?sv=2019-02-02&sr=b&sig=jCtUELsm9fqSg%2BOeWzgGIt6RlNLapJEDOX9VUN10ggk%3D&st=2021-03-01T14%3A08%3A23Z&se=2021-03-01T22%3A18%3A23Z&sp=r'},
 'submittedBy': 'ODL_User 139669'}

## Best Model

In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model( Hyper parameters, Accuracy and its Run id ).

In [None]:
import joblib
from azureml.core.model import Model

best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

print(best_run.get_details()['runDefinition']['arguments'])
print(best_run_metrics['Accuracy'])
print(best_run.id)

['--C', '0.056669921024075975', '--max_iter', '150']
0.8571428571428571
HD_ae4b7302-9c23-4b24-aff8-c9b0db3b7fb2_8


In [None]:
#TODO: Save the best model
model = best_run.register_model(
    model_name='hearthyperdrive', 
    model_path='./outputs/model.joblib', 
    model_framework=Model.Framework.SCIKITLEARN, 
    model_framework_version='0.19.1')

## Cluster Clean Up




In [None]:
cpu_cluster.delete()