# Automated ML

## General setup

In [1]:
# Imports
from azureml.core import Workspace, Experiment, Model
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.dataset import Dataset
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core.webservice import AciWebservice
import joblib
import json
import requests
import pandas

In [2]:
# Creation of compute cluster to carry our the automated ML
ws = Workspace.from_config()
compute_name = "udacity-cluster"
try:
    compute = ComputeTarget(workspace=ws, name=compute_name)
    print('Compute cluster {} already exists!'.format(compute_name))
except ComputeTargetException:
    config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    compute = ComputeTarget.create(ws, compute_name, config)
    
compute.wait_for_completion()

## Dataset

### Overview
The dataset used was generated using the notebook https://github.com/zgoey/azure_ml_capstone/blob/master/generate_data.ipynb using the file:

https://unpkg.com/color-name-list/dist/colornames.csv. 

The notebook runs code to label colors with one of the basic shades from the set {White, Black, Grey, Yellow, Red, Blue, Green, Brown, Pink, Orange, Purple} (see https://thelandofcolor.com/11-basic-color-names/). It does so by looking at the color name and taking the shade that occurs last in this string. So, for instance, the color 'Azure Green Blue' is assigned the label 'Blue'. Thus, a list of labeled RGB-values is built, where each RGB-triple is assigned to one of the basic shade classes. 

We have uploaded the output of the notebook to:

https://github.com/zgoey/azure_ml_capstone/blob/master/color_shades.csv

and we will download it in raw form from this repo into our Azure workspace. It will then be used to train a classifier that can assign basic color shades to RGB-triplets. Such a classifier can then be used by color-blind people to detemine what color they are looking at. The end-application that we have in mind is something like http://www.hikarun.com/e/. This, however, is a program that can only run under Windows. The advantage of having a web service doing the classification is that it can be accessed on a much wider range of devices.

In [8]:
#TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.
dataset_name = 'color_shades'
if dataset_name in ws.datasets.keys():
        dataset = ws.datasets[dataset_name] 
else:
        url = "https://raw.githubusercontent.com/zgoey/azure_ml_capstone/master/color_shades.csv"
        dataset = Dataset.File.from_files(url)        
        dataset.register(workspace = ws, name = dataset_name,
                                 description = 'RGB values labeled with color shade names',
                                 create_new_version = True)

datastore = ws.get_default_datastore()
os.makedirs('data', exist_ok=True)
dataset.download(target_path='data', overwrite=True)[0]
datastore.upload(src_dir='data', target_path='data')
tabular_dataset = Dataset.Tabular.from_delimited_files(path = [(datastore, ('data/color_shades.csv'))])

Uploading an estimated of 1 files
Uploading data/color_shades.csv
Uploaded data/color_shades.csv, 1 files out of an estimated total of 1
Uploaded 1 files


In [9]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'udacity-capstone'

experiment = Experiment(ws, experiment_name)

## AutoML Configuration

We will now set up an AutoML experiment, where the task is set to classification, since we have a limited set of 11 labels, which we wish to discern. We set the compute target to the compute that we created earlier in this notebook, and we set the training data to the dataset that we just downloaded from our Github repo. Our target column is set to "Shade", since that is what we wish to predict. 

In the AutoML settings, we choose accuracy as our primary metric, which is the most common measure to use for classification tasks. We apply 5-fold cross-validation to get a more stable accuracy estimate, as compared to using a simple train/validation set set-up. To be sure that we do not run our experiment forever (thereby incurring unreasonable costs), we limit the time that the experiment will run to 1 hour. Finally, we set the maximum number of concurrent iterations to four, to make maximal usage of the concurrency capabilities of our compute.

In [10]:
# TODO: Put your automl settings here
automl_settings = {
    "n_cross_validations": 5,
    "primary_metric": 'accuracy',
    "experiment_timeout_hours": 1.0,
    "max_concurrent_iterations": 4,
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(task = 'classification',
                             compute_target = compute,
                             training_data = tabular_dataset,
                             label_column_name = 'Shade',
                             **automl_settings)

In [11]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [12]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [14]:
remote_run.wait_for_completion()

{'runId': 'AutoML_bb8b274a-cb42-40f0-bb76-669dde0f50b7',
 'target': 'udacity-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-02-16T10:56:17.925888Z',
 'endTimeUtc': '2021-02-16T12:09:43.044336Z',
 'properties': {'num_iterations': '1000',
  'training_type': 'TrainFull',
  'acquisition_function': 'EI',
  'primary_metric': 'accuracy',
  'train_split': '0',
  'acquisition_parameter': '0',
  'num_cross_validation': '5',
  'target': 'udacity-cluster',
  'DataPrepJsonString': '{\\"training_data\\": \\"{\\\\\\"blocks\\\\\\": [{\\\\\\"id\\\\\\": \\\\\\"0e6f2f6f-6e34-42b1-a287-8d5c0ce467d0\\\\\\", \\\\\\"type\\\\\\": \\\\\\"Microsoft.DPrep.GetDatastoreFilesBlock\\\\\\", \\\\\\"arguments\\\\\\": {\\\\\\"datastores\\\\\\": [{\\\\\\"datastoreName\\\\\\": \\\\\\"workspaceblobstore\\\\\\", \\\\\\"path\\\\\\": \\\\\\"data/color_shades.csv\\\\\\", \\\\\\"resourceGroup\\\\\\": \\\\\\"experimental\\\\\\", \\\\\\"subscription\\\\\\": \\\\\\"c0e92620-6229-4209-b236-c48f10a3d133\\\\\\", \\\\\\"work

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [16]:
# Retrieve and save your best automl model.
best_automl_run, best_automl_model = remote_run.get_output()
print('Best model metrics:\n', best_automl_run.get_metrics(), '\n')
print('Best model steps:\n', best_automl_model.steps, '\n')

Package:azureml-automl-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-core, training version:1.21.0.post1, current version:1.20.0
Package:azureml-dataprep, training version:2.8.2, current version:2.7.3
Package:azureml-dataprep-native, training version:28.0.0, current version:27.0.0
Package:azureml-dataprep-rslex, training version:1.6.0, current version:1.5.0
Package:azureml-dataset-runtime, training version:1.21.0, current version:1.20.0
Package:azureml-defaults, training version:1.21.0, current version:1.20.0
Package:azureml-interpret, training version:1.21.0, current version:1.20.0
Package:azureml-pipeline-core, training version:1.21.0, current version:1.20.0
Package:azureml-telemetry, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-client, training version:1.21.0, current version:1.20.0
Package:azureml-train-automl-runtime, training version:1.21.0, current version:1.20.0


Best model metrics:
 {'precision_score_micro': 0.8090692124105011, 'weighted_accuracy': 0.8586735968957104, 'matthews_correlation': 0.7796908455312179, 'precision_score_macro': 0.7652059996807654, 'average_precision_score_macro': 0.8166392584146343, 'log_loss': 0.6002653992548741, 'recall_score_micro': 0.8090692124105011, 'precision_score_weighted': 0.8095617384458118, 'recall_score_weighted': 0.8090692124105011, 'average_precision_score_micro': 0.8787022730202901, 'accuracy': 0.8090692124105011, 'AUC_macro': 0.9740380868004174, 'average_precision_score_weighted': 0.8671722108199603, 'balanced_accuracy': 0.7525700921546362, 'f1_score_macro': 0.7563399537810058, 'AUC_weighted': 0.9785445870804639, 'AUC_micro': 0.9803395211161172, 'norm_macro_recall': 0.7278271013701, 'recall_score_macro': 0.7525700921546362, 'f1_score_micro': 0.8090692124105011, 'f1_score_weighted': 0.8078866010577042, 'confusion_matrix': 'aml://artifactId/ExperimentRun/dcid.AutoML_bb8b274a-cb42-40f0-bb76-669dde0f50b7_1

In [17]:
# Zoom in on best model to get full view on estimators
print(best_automl_model.steps[-1][1].get_params(deep=True)) #(set deep=False for less verbose output)

{'base_learners': None, 'meta_learner': None, 'training_cv_folds': None, '135': Pipeline(memory=None,
         steps=[('standardscalerwrapper',
                 <azureml.automl.runtime.shared.model_wrappers.StandardScalerWrapper object at 0x7f7bb557b278>),
                ('xgboostclassifier',
                 XGBoostClassifier(base_score=0.5, booster='gbtree',
                                   colsample_bylevel=1, colsample_bynode=1,
                                   colsample_bytree=1, eta=0.001, gamma=0.01,
                                   learning_rate=0.1, max_delta_step=0,
                                   max_depth=6, max_leaves=0,
                                   min_child_weight=1, missing=nan,
                                   n_estimators=200, n_jobs=1, nthread=None,
                                   objective='multi:softprob', random_state=0,
                                   reg_alpha=2.5, reg_lambda=2.3958333333333335,
                                   scale_po

In [18]:
#TODO: Save the best model
os.makedirs('models', exist_ok=True)
joblib.dump(value=best_automl_model, filename="models/automl_color_shades.pkl")


['models/automl_color_shades.pkl']

In [19]:
best_automl_run.download_file('outputs/scoring_file_v_1_0_0.py', 'automl_score.py')

In [20]:
best_automl_run.download_file('outputs/conda_env_v_1_0_0.yml', 'automl_env.yml')

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [21]:
model_name = best_automl_run.properties['model_name']
description = 'Best model for color shade classification found by AutoML'
tags = None
model = remote_run.register_model(model_name = model_name, description = description, tags = tags)
automl_env = Environment.from_conda_specification(name="automl_env", file_path="automl_env.yml")
inference_config = InferenceConfig(entry_script='automl_score.py',environment= automl_env)

aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1, 
                                               memory_gb = 1, 
                                               description = 'AutoML for color shade classification')

aci_service_name = 'automl-color-shade'
print(aci_service_name)
aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)
aci_service.wait_for_deployment(True)
print(aci_service.state)

automl-color-shade
Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running................................................................................................................................................................................................
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


TODO: In the cell below, send a request to the web service you deployed to test it.

In [22]:
data = {
    "data":
    [
        {
            'Red': "255",
            'Green': "10",
            'Blue': "2",
        },
    ],
}

# Convert to JSON string
input_data = json.dumps(data)
with open("data.json", "w") as _f:
    _f.write(input_data)

# Set the content type
headers = {"Content-Type": "application/json"}

# Make the request and display the response
resp = requests.post(aci_service.scoring_uri, input_data, headers=headers)
print(resp.json())


{"result": ["Red"]}


TODO: In the cell below, print the logs of the web service and delete the service

In [23]:
print(aci_service.get_logs())
aci_service.delete()

2021-02-16T12:59:52,955622100+00:00 - rsyslog/run 
2021-02-16T12:59:52,966383400+00:00 - gunicorn/run 
2021-02-16T12:59:52,968686000+00:00 - iot-server/run 
rsyslogd: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libuuid.so.1: no version information available (required by rsyslogd)
2021-02-16T12:59:53,041684800+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_7785023fceb74e4facc1b1a577b1faf9/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml

## Cleanup

In [24]:
# Clean up compute cluster
try:
    compute.delete()
except ComputeTargetException as e:
    print(e.message)
    print("Failed to clean up compute cluster!")
compute.wait_for_completion(show_output=True, is_delete_operation=True)

Deleting.....Current provisioning state of AmlCompute is "Deleting"

.............Current provisioning state of AmlCompute is "Deleting"

..............Current provisioning state of AmlCompute is "Deleting"

.............Current provisioning state of AmlCompute is "Deleting"

..........
SucceededProvisioning operation finished, operation "Succeeded"
