# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [13]:
from azureml.core import Workspace, Experiment

ws = Workspace.from_config()
exp = Experiment(ws, name="hyperdrive_run")

print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\n')
run = exp.start_logging()

hei-aep-vn001-d-we-ml-mlsa-01
hei-aep-vn001-d-we-rg-mlsa-01
westeurope
99149dbb-f505-4eb9-901a-c82ba247986c


## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [14]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException

# NOTE: update the cluster name to match the existing cluster
# Choose a name for your CPU cluster
amlcompute_cluster_name = "cluster-bank-marketing-1"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=4)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)
# For a more detailed view of current AmlCompute status, use get_status().

Found existing cluster, use it.
Succeeded..............................................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [15]:
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.parameter_expressions import uniform
from azureml.train.hyperdrive.parameter_expressions import choice
import os

# Specify parameter sampler (continuous)
ps = RandomParameterSampling ({
    "--C":uniform(0.1, 1.0), #equivalent to the value of k
    "--max_iter":choice(10,25,50,75,100)
    #max_iter between 1 to 999
})

# Specify a Policy
early_termination_policy = BanditPolicy(slack_factor=0.1,evaluation_interval=1,delay_evaluation=5)

if "training" not in os.listdir():
    os.mkdir("./training")

# Create a SKLearn estimator for use with train.py
est = SKLearn(source_directory='.',
compute_target=compute_target,vm_size="STANDARD_D2_V2",entry_script="train.py")



In [16]:
# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
hyperdrive_config = HyperDriveConfig(
    estimator=est,
    hyperparameter_sampling=ps,
    policy=early_termination_policy,
    primary_metric_name='Accuracy',
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    max_total_runs=50,
    max_concurrent_runs=5
)

In [18]:
# Submit your hyperdrive run to the experiment and show run details with the widget.
from azureml.widgets import RunDetails
hyperdrive_run=exp.submit(hyperdrive_config)
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_018762ee-e35c-449c-be9a-aa71f29e92c0
Web View: https://ml.azure.com/runs/HD_018762ee-e35c-449c-be9a-aa71f29e92c0?wsid=/subscriptions/99149dbb-f505-4eb9-901a-c82ba247986c/resourcegroups/hei-aep-vn001-d-we-rg-mlsa-01/workspaces/hei-aep-vn001-d-we-ml-mlsa-01&tid=66e853de-ece3-44dd-9d66-ee6bdf4159d4

Streaming azureml-logs/hyperdrive.txt

[2023-05-29T08:04:12.359277][GENERATOR][INFO]Trying to sample '5' jobs from the hyperparameter space
[2023-05-29T08:04:12.8045017Z][SCHEDULER][INFO]Scheduling job, id='HD_018762ee-e35c-449c-be9a-aa71f29e92c0_0' 
[2023-05-29T08:04:12.9658724Z][SCHEDULER][INFO]Scheduling job, id='HD_018762ee-e35c-449c-be9a-aa71f29e92c0_1' 
[2023-05-29T08:04:13.1716358Z][SCHEDULER][INFO]Scheduling job, id='HD_018762ee-e35c-449c-be9a-aa71f29e92c0_3' 
[2023-05-29T08:04:13.214079][GENERATOR][INFO]Successfully sampled '5' jobs, they will soon be submitted to the execution target.
[2023-05-29T08:04:13.2926269Z][SCHEDULER][INFO]Scheduling job, id='HD_018762ee-e35c-449c-b

{'runId': 'HD_018762ee-e35c-449c-be9a-aa71f29e92c0',
 'target': 'cluster-bank-marketing-1',
 'status': 'Completed',
 'startTimeUtc': '2023-05-29T08:04:11.622332Z',
 'endTimeUtc': '2023-05-29T08:20:50.272079Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"Accuracy","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '3dc4d611-af92-4921-8418-c97bd20d34a9',
  'user_agent': 'python/3.8.5 (Linux-5.15.0-1017-azure-x86_64-with-glibc2.10) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.44.0',
  'space_size': 'infinite_space_size',
  'score': '0.7888888888888889',
  'best_child_run_id': 'HD_018762ee-e35c-449c-be9a-aa71f29e92c0_11',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_018762ee-e35c-449c-be9a-aa71f29e92c0_11'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
 

## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [19]:
from azureml.widgets import RunDetails
RunDetails(hyperdrive_run).show()

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Model

TODO: In the cell below, get the best model from the hyperdrive experiments and display all the properties of the model.

In [20]:
import joblib
# Get your best run and save the model from that run.
best_run=hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics=best_run.get_metrics()
parameter_values=best_run.get_details()['runDefinition']['arguments']

print('Best Run ID:',best_run.id)
print('\n Accuracy:',best_run_metrics['Accuracy'])

Best Run ID: HD_018762ee-e35c-449c-be9a-aa71f29e92c0_11

 Accuracy: 0.7888888888888889


In [21]:
print(best_run)
best_run

Run(Experiment: hyperdrive_run,
Id: HD_018762ee-e35c-449c-be9a-aa71f29e92c0_11,
Type: azureml.scriptrun,
Status: Completed)


Experiment,Id,Type,Status,Details Page,Docs Page
hyperdrive_run,HD_018762ee-e35c-449c-be9a-aa71f29e92c0_11,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [22]:
#TODO: Save the best model
model=best_run.register_model(model_name='hyperdrive_model',model_path='./outputs/model.joblib')