# Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import numpy as np
import os
import matplotlib.pyplot as plt

import azureml
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import choice, uniform
from azureml.core import Environment, ScriptRunConfig

## Dataset

In [2]:
ws = Workspace.from_config()
experiment_name = 'cancer-hyperdrive'

experiment=Experiment(ws, experiment_name)

To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code C6HLJ6YFM to authenticate.


Performing interactive authentication. Please follow the instructions on the terminal.
Interactive authentication successfully completed.


In [3]:

amlcompute_cluster_name = "cluster-project"

try:
    compute_target = ComputeTarget(workspace=ws, name=amlcompute_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=4)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True, min_node_count = 1, timeout_in_minutes = 10)

InProgress.
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded............................................................................................................
AmlCompute wait for completion finished

Wait timeout has been reached
Current provisioning state of AmlCompute is "Succeeded" and current node count is "0"


In [4]:
dataset_name = 'cancer-data'
try: 
    ds = ws.datasets[dataset_name]
except KeyError:
    print("Dataset not found, create and rerun this cell!")
    raise

In [5]:
df = ds.to_pandas_dataframe()
df.head()

Unnamed: 0,id,diagnosis,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,...,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
0,842302,M,17.99,10.38,122.8,1001.0,0.1184,0.2776,0.3001,0.1471,...,25.38,17.33,184.6,2019.0,0.1622,0.6656,0.7119,0.2654,0.4601,0.1189
1,842517,M,20.57,17.77,132.9,1326.0,0.08474,0.07864,0.0869,0.07017,...,24.99,23.41,158.8,1956.0,0.1238,0.1866,0.2416,0.186,0.275,0.08902
2,84300903,M,19.69,21.25,130.0,1203.0,0.1096,0.1599,0.1974,0.1279,...,23.57,25.53,152.5,1709.0,0.1444,0.4245,0.4504,0.243,0.3613,0.08758
3,84348301,M,11.42,20.38,77.58,386.1,0.1425,0.2839,0.2414,0.1052,...,14.91,26.5,98.87,567.7,0.2098,0.8663,0.6869,0.2575,0.6638,0.173
4,84358402,M,20.29,14.34,135.1,1297.0,0.1003,0.1328,0.198,0.1043,...,22.54,16.67,152.2,1575.0,0.1374,0.205,0.4,0.1625,0.2364,0.07678


## Hyperdrive Configuration

For the hyperparameter tuning, I chose a simple K nearest neighbor model, because the data is not complex, an I wanted to see, how a simple model would perform on them.
This model is also not time-consuming.

The hyperparameters to be tuned:
- **n**: number of neighbors in the algorithm
- **weights**: function to be used when calculating the weight of neighboring data points (can be *uniform* or *distance*)
- **p**: power parameter fro the Minkowski metric

#### Sampling: RandomParameterSampling
Random Parameter Sampling chooses parameters from a prespecified set of discrete parameters or a continuous limited set. This sampler chooses parameters randomly, this way we do not have to check each parameter combination. This is a time-efficient way of sampling parameters.


#### Stopping policy: Bandit
The bandit policy terminates runs where the primary metric is not within the specified slack factor (0.1) compared to the best performing model. Setting this policy ensures that models performing 10% worse than already trained models, will not be trained full, therefore we can spare time.Ű

In [58]:
param_sampling = RandomParameterSampling({
    "n":  choice(2, 3, 4, 5, 6, 7, 8), 
    "weights": choice('uniform', 'distance'),
    "p": choice(1, 2, 3, 4, 5)
    }
)

early_termination_policy = BanditPolicy(evaluation_interval=2, slack_factor=0.1)

if "training" not in os.listdir():
    os.mkdir("./training")

sklearn_env = Environment.from_conda_specification(name='sklearn-env', file_path='conda_dependencies.yml')

src = ScriptRunConfig('.', 'train.py', compute_target=compute_target, environment=sklearn_env)

hyperdrive_config = HyperDriveConfig(run_config=src,
                                     hyperparameter_sampling=param_sampling,
                                     policy=early_termination_policy,
                                     primary_metric_name='Accuracy',
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs=12,
                                     max_concurrent_runs=4)

In [59]:
hyperdrive_run = experiment.submit(config=hyperdrive_config)

## Run Details


In [60]:
RunDetails(hyperdrive_run).show()
hyperdrive_run.wait_for_completion(show_output=True)

_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

RunId: HD_8a46e247-8b0f-47e0-9000-6c0a32f01941
Web View: https://ml.azure.com/runs/HD_8a46e247-8b0f-47e0-9000-6c0a32f01941?wsid=/subscriptions/3d1a56d2-7c81-4118-9790-f85d1acf0c77/resourcegroups/aml-quickstarts-233260/workspaces/quick-starts-ws-233260&tid=660b3398-b80e-49d2-bc5b-ac1dc93b5254

Streaming azureml-logs/hyperdrive.txt

[2023-05-09T13:12:11.507327][GENERATOR][INFO]Trying to sample '4' jobs from the hyperparameter space
[2023-05-09T13:12:12.0091964Z][SCHEDULER][INFO]Scheduling job, id='HD_8a46e247-8b0f-47e0-9000-6c0a32f01941_0' 
[2023-05-09T13:12:12.1656567Z][SCHEDULER][INFO]Scheduling job, id='HD_8a46e247-8b0f-47e0-9000-6c0a32f01941_1' 
[2023-05-09T13:12:12.281653][GENERATOR][INFO]Successfully sampled '4' jobs, they will soon be submitted to the execution target.
[2023-05-09T13:12:12.2826550Z][SCHEDULER][INFO]Scheduling job, id='HD_8a46e247-8b0f-47e0-9000-6c0a32f01941_2' 
[2023-05-09T13:12:12.3987099Z][SCHEDULER][INFO]Scheduling job, id='HD_8a46e247-8b0f-47e0-9000-6c0a32f019

{'runId': 'HD_8a46e247-8b0f-47e0-9000-6c0a32f01941',
 'target': 'cluster-project',
 'status': 'Completed',
 'startTimeUtc': '2023-05-09T13:12:10.675524Z',
 'endTimeUtc': '2023-05-09T13:16:43.79761Z',
 'services': {},
 'properties': {'primary_metric_config': '{"name":"Accuracy","goal":"maximize"}',
  'resume_from': 'null',
  'runTemplate': 'HyperDrive',
  'azureml.runsource': 'hyperdrive',
  'platform': 'AML',
  'ContentSnapshotId': '5fb60bd2-00f4-4ec3-be4f-54a73f3a5fed',
  'user_agent': 'python/3.8.5 (Linux-5.15.0-1035-azure-x86_64-with-glibc2.10) msrest/0.7.1 Hyperdrive.Service/1.0.0 Hyperdrive.SDK/core.1.49.0',
  'space_size': '70',
  'score': '0.8421052631578947',
  'best_child_run_id': 'HD_8a46e247-8b0f-47e0-9000-6c0a32f01941_3',
  'best_metric_status': 'Succeeded',
  'best_data_container_id': 'dcid.HD_8a46e247-8b0f-47e0-9000-6c0a32f01941_3'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'configuration': None,
  'attribution': None,
  'telemetryValues': {'amlClie

KeyError: 'log_files'

## Best Model

In [65]:
best_run = hyperdrive_run.get_best_run_by_primary_metric()
print(best_run)
print(best_run.get_details()['runDefinition']['arguments'])
print(best_run.get_metrics())


Run(Experiment: cancer-hyperdrive,
Id: HD_8a46e247-8b0f-47e0-9000-6c0a32f01941_3,
Type: azureml.scriptrun,
Status: Completed)
['--n', '6', '--p', '1', '--weights', 'distance']
{'Num neighbors:': 6, 'Weight function:': 'distance', 'Metric power:': 1, 'Accuracy': 0.8421052631578947}


In [62]:
for f in best_run.get_file_names():
    if f.startswith('outputs/model'):
        output_file_path = os.path.join('./model', f.split('/')[-1])
        print('Downloading from {} to {} ...'.format(f, output_file_path))
        best_run.download_file(name=f, output_file_path=output_file_path)
best_run.get_file_names()

Downloading from outputs/model/model.h5 to ./model/model.h5 ...


['outputs/model/model.h5',
 'system_logs/cs_capability/cs-capability.log',
 'system_logs/hosttools_capability/hosttools-capability.log',
 'system_logs/lifecycler/execution-wrapper.log',
 'system_logs/lifecycler/lifecycler.log',
 'system_logs/metrics_capability/metrics-capability.log',
 'system_logs/snapshot_capability/snapshot-capability.log',
 'user_logs/std_log.txt']

## Model Deployment

Remember you have to deploy only one of the two models you trained but you still need to register both the models. Perform the steps in the rest of this notebook only if you wish to deploy this model.

In [63]:
best_run.get_details()

{'runId': 'HD_8a46e247-8b0f-47e0-9000-6c0a32f01941_3',
 'target': 'cluster-project',
 'status': 'Completed',
 'startTimeUtc': '2023-05-09T13:12:28.608995Z',
 'endTimeUtc': '2023-05-09T13:12:45.852603Z',
 'services': {},
 'properties': {'_azureml.ComputeTargetType': 'amlctrain',
  'ContentSnapshotId': '5fb60bd2-00f4-4ec3-be4f-54a73f3a5fed',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json'},
 'inputDatasets': [],
 'outputDatasets': [],
 'runDefinition': {'script': 'train.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--n', '6', '--p', '1', '--weights', 'distance'],
  'sourceDirectoryDataStore': None,
  'framework': 'Python',
  'communicator': 'None',
  'target': 'cluster-project',
  'dataReferences': {},
  'data': {},
  'outputData': {},
  'datacaches': [],
  'jobName': None,
  'maxRunDurationSeconds': 2592000,
  'nodeCount': 1,
  'instanceTypes': [],
  'priority': None,
  'credentialPassthrough': False,


In [64]:
model = best_run.register_model(model_name='cancer-hyperdrive-model', model_path='outputs/model')

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service

**Submission Checklist**
- I have registered the model.
- I have deployed the model with the best accuracy as a webservice.
- I have tested the webservice by sending a request to the model endpoint.
- I have deleted the webservice and shutdown all the computes that I have used.
- I have taken a screenshot showing the model endpoint as active.
- The project includes a file containing the environment details.

