###### Hyperparameter Tuning using HyperDrive

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import logging
import os
import csv
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources
from azureml.train.hyperdrive import RandomParameterSampling
from azureml.train.hyperdrive import normal, uniform, choice
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.core.dataset import Dataset
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform


## Dataset

TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [3]:
ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')


Workspace name: quick-starts-ws-143048
Azure region: southcentralus
Subscription id: 81cefad3-d2c9-4f77-a466-99a7f541c7bb
Resource group: aml-quickstarts-143048


In [4]:
experiment_name = 'ChurnPrediction'

experiment=Experiment(ws, experiment_name)
run = experiment.start_logging()

In [5]:
# TODO: Create compute cluster
# max_nodes should be no greater than 4.

# choose a name for your cluster
cluster_name = "notebook143048"

try:
    compute_target = ComputeTarget(workspace=ws, name=cluster_name)
    print('Found existing compute target')
except ComputeTargetException:
    print('Creating a new compute target...')
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2', 
                                                           max_nodes=4)

    # create the cluster
    compute_target = ComputeTarget.create(ws, cluster_name, compute_config)

# can poll for a minimum number of nodes and for a specific timeout. 
# if no min node count is provided it uses the scale settings for the cluster
#compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=30)
    
 # use get_status() to get a detailed status for the current cluster. 
#print(compute_target.get_status().serialize())

Found existing compute target


In [6]:
found = False
key = "Churn Prediction Dataset"
description_text = "Churn Prediction for Capstone Project"

if key in ws.datasets.keys(): 
        found = True
        ds = ws.datasets[key] 

if not found:
        # Create Dataset and register it into Workspace
        dataset_link = 'https://raw.githubusercontent.com/tejasbangera/Udacity-Captstone-Project/main/WA_Fn-UseC_-Telco-Customer-Churn.csv'
        ds = TabularDatasetFactory.from_delimited_files(path = dataset_link)        
        #Register Dataset in Workspace
        ds = ds.register(workspace=ws,name=key,description=description_text)

## Hyperdrive Configuration

TODO: Explain the model you are using and the reason for chosing the different hyperparameters, termination policy and config settings.

In [7]:
from azureml.widgets import RunDetails
from azureml.train.sklearn import SKLearn
from azureml.train.hyperdrive.run import PrimaryMetricGoal
from azureml.train.hyperdrive.policy import BanditPolicy
from azureml.train.hyperdrive.sampling import RandomParameterSampling
from azureml.train.hyperdrive.runconfig import HyperDriveConfig
from azureml.train.hyperdrive.parameter_expressions import uniform, choice , normal
import os

# Specify parameter sampler
parameter_sampler = RandomParameterSampling( {
        "--C": uniform(0.05, 0.1),
        "--max_iter": choice(16, 32, 64, 128)}) ### YOUR CODE HERE ###

# Specify a Policy
policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=2, delay_evaluation=5) ### YOUR CODE HERE ###
"""Bandit terminates runs where the primary metric is not within 
the specified slack factor/slack amount compared to the best performing run."""

if "training" not in os.listdir():
    os.mkdir("./training")

# Create a SKLearn estimator for use with train.py
est = SKLearn(source_directory="./", 
    compute_target=compute_target, entry_script="train.py")### YOUR CODE HERE ###

# Create a HyperDriveConfig using the estimator, hyperparameter sampler, and policy.
hyperdrive_config = HyperDriveConfig(estimator = est,
                                     hyperparameter_sampling = parameter_sampler,
                                     policy = policy,
                                     primary_metric_name="Accuracy",
                                     primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
                                     max_total_runs=50,
                                     max_concurrent_runs = 5
                                     )### YOUR CODE HERE ###

'SKLearn' estimator is deprecated. Please use 'ScriptRunConfig' from 'azureml.core.script_run_config' with your own defined environment or the AzureML-Tutorial curated environment.
'enabled' is deprecated. Please use the azureml.core.runconfig.DockerConfiguration object with the 'use_docker' param instead.


## Run Details

In [9]:
# Submit your hyperdrive run to the experiment and show run details with the widget.

hyperdrive_run = experiment.submit(hyperdrive_config)
RunDetails(hyperdrive_run).show() 



_HyperDriveWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO'…

## Best Run

In [11]:
import joblib
# Get your best run and save the model from that run.
best_run = hyperdrive_run.get_best_run_by_primary_metric()
best_run_metrics = best_run.get_metrics()

print('Best Run Id: ', best_run.id)
print('Accuracy: ', best_run_metrics['Accuracy'])

best_run.get_file_names() #To get the actual model file

Best Run Id:  HD_76f42d43-ca2b-4536-9ddc-0085e5829c77_9
Accuracy:  0.7960199004975125


['azureml-logs/55_azureml-execution-tvmps_5d3e12c72d52fca97ab1343c91f2869c67751cb6f69fd97ae068a74367385df6_d.txt',
 'azureml-logs/65_job_prep-tvmps_5d3e12c72d52fca97ab1343c91f2869c67751cb6f69fd97ae068a74367385df6_d.txt',
 'azureml-logs/70_driver_log.txt',
 'azureml-logs/75_job_post-tvmps_5d3e12c72d52fca97ab1343c91f2869c67751cb6f69fd97ae068a74367385df6_d.txt',
 'logs/azureml/102_azureml.log',
 'logs/azureml/dataprep/backgroundProcess.log',
 'logs/azureml/dataprep/backgroundProcess_Telemetry.log',
 'logs/azureml/job_prep_azureml.log',
 'logs/azureml/job_release_azureml.log',
 'outputs/model.joblib']

In [12]:
best_run.download_file(name="outputs/model.joblib", output_file_path="./outputs/")

In [13]:

best_run

Experiment,Id,Type,Status,Details Page,Docs Page
ChurnPrediction,HD_76f42d43-ca2b-4536-9ddc-0085e5829c77_9,azureml.scriptrun,Completed,Link to Azure Machine Learning studio,Link to Documentation


In [14]:
print(best_run.get_file_names())

['azureml-logs/55_azureml-execution-tvmps_5d3e12c72d52fca97ab1343c91f2869c67751cb6f69fd97ae068a74367385df6_d.txt', 'azureml-logs/65_job_prep-tvmps_5d3e12c72d52fca97ab1343c91f2869c67751cb6f69fd97ae068a74367385df6_d.txt', 'azureml-logs/70_driver_log.txt', 'azureml-logs/75_job_post-tvmps_5d3e12c72d52fca97ab1343c91f2869c67751cb6f69fd97ae068a74367385df6_d.txt', 'logs/azureml/102_azureml.log', 'logs/azureml/dataprep/backgroundProcess.log', 'logs/azureml/dataprep/backgroundProcess_Telemetry.log', 'logs/azureml/job_prep_azureml.log', 'logs/azureml/job_release_azureml.log', 'outputs/model.joblib']


## Model Deployment
I have decided not to deploy this model as a web service because the accuracy of AutoML model is better.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service