# Automated ML

## General setup

In [1]:
# Imports
from azureml.core import Workspace, Experiment, Model
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.dataset import Dataset
from azureml.core.compute_target import ComputeTargetException
from azureml.train.automl import AutoMLConfig
from azureml.widgets import RunDetails
import joblib

In [2]:
# Creation of compute cluster to carry our the automated ML
ws = Workspace.from_config()
compute_name = "udacity-cluster"
try:
    compute = ComputeTarget(workspace=ws, name=compute_name)
    print('Compute cluster {} already exists!'.format(compute_name))
except ComputeTargetException:
    config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2', max_nodes=4)
    compute = ComputeTarget.create(ws, compute_name, config)
    
compute.wait_for_completion()

Compute cluster udacity-cluster already exists!


## Dataset

### Overview
The dataset used was generated using the notebook https://github.com/zgoey/azure_ml_capstone/blob/master/generate_data.ipynb using the file:

https://unpkg.com/color-name-list/dist/colornames.csv. 

The notebook runs code to label colors with one of the basic shades from the set {White, Black, Grey, Yellow, Red, Blue, Green, Brown, Pink, Orange, Purple} (see https://thelandofcolor.com/11-basic-color-names/). It does so by looking at the color name and taking the shade that occurs last in this string. So, for instance, the color 'Azure Green Blue' is assigned the label 'Blue'. Thus, a list of labeled RGB-values is built, where each RGB-triple is assigned to onw of the casic shade classes. 

We have uploaded the output of the notebook to:

https://github.com/zgoey/azure_ml_capstone/blob/master/color_shades.csv

and we will download it in raw form from this repo into our Azure workspace. It will then be used to train a ckassifier that can assign basic color shades to RGB-triplets. Such a classifier can then be used by color-blind people to detemine what color they are looking at. The end-application that we have in mind is something like http://www.hikarun.com/e/. This, however, is a progrtam that can only run under Windows. The advantage of having a web service doing the classification is that it can be accessed on a much wider range of devices.

In [3]:
#TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.
dataset_name = 'color_shades'
if dataset_name in ws.datasets.keys():
        dataset = ws.datasets[dataset_name] 
else:
        url = "https://raw.githubusercontent.com/zgoey/azure_ml_capstone/master/color_shades.csv"
        dataset = Dataset.Tabular.from_delimited_files(url)        
        dataset.register(workspace = ws, name = dataset_name,
                                 description = 'RGB values labeled with color shade names',
                                 create_new_version = True)

In [4]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'udacity-capstone'

experiment = Experiment(ws, experiment_name)

## AutoML Configuration

We will now set up an AutoML experiment, where the task is set to classification, since we have a limited set of 11 labels, which we wish to discern. We set the compute target to the compute that we created earlier in this notebook, and we set the training data to the dataset that we just downloaded from our Github repo. Our target column is set to "Shade", since that is what we wish to predict. 

In the AutoML settings, we choose accuracy as our pimart metric, which is the most common measure to use for classification tasks. We apply 5-fold cross-validation to get a more stable accuracy estimate, as compared to using a simple train/validation set set-up. To be sure that we do not run our experiment forever (thereby incurring unreasonable costs), we limit the time that the experiment will run to 1 hour. Finally, we set the maximum number of concurrent iterations to four, to make maximla usage of the concurrency capabilities of our compute.

In [5]:
# TODO: Put your automl settings here
automl_settings = {
    "n_cross_validations": 5,
    "primary_metric": 'accuracy',
    "experiment_timeout_hours": 1.0,
    "max_concurrent_iterations": 4,
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(task = 'classification',
                             compute_target = compute,
                             training_data = dataset,
                             label_column_name = 'Shade',
                             **automl_settings)

In [6]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Running on remote.


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [7]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

In [8]:
remote_run.wait_for_completion(show_output=False)

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [None]:
# Retrieve and save your best automl model.
best_automl_run, best_automl_model = remote_run.get_output()
print('Best model metrics:\n', best_automl_run.get_metrics(), '\n')
print('Best model steps:\n', best_automl_model.steps, '\n')

In [None]:
# Zoom in on best model to get full view on estimators
print(best_automl_model.steps[-1][1].get_params(deep=True))

In [None]:
#TODO: Save the best model
os.makedirs('./models', exist_ok=True)
joblib.dump(value=best_automl_model, filename="./models/automl_color_shades.pkl")
Model.register(model_path = "./models/automl_color_shades.pkl",
                       model_name = "automl_color_shades",
                       description = "RGB color shade classifier trained by AutoML",
                       workspace = ws)

## Cleanup

In [None]:
# Clean up compute cluster
try:
    compute.delete()
except ComputeTargetException as e:
    print(e.message)
    print("Failed to clean up compute cluster!")
compute.wait_for_completion(show_output=True, is_delete_operation=True)

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

TODO: In the cell below, send a request to the web service you deployed to test it.

TODO: In the cell below, print the logs of the web service and delete the service