# Azure Machine Learning Series Notebook
- Prepared by Vivek Raja P S
- In this notebook, we will learn how to use Azure Machine Learning Python SDK to perform Machine Learning Tasks


## About the Author

Vivek Raja P S is working as Data Scientist at NexStem and Organiser at Azure Developer Community Groups in Tamil Nadu. He is also an AWS Community Builder for Machine Learning. He is Microsoft Certified Azure Data Scientist, AI Engineer and Data Engineer. Besides, he loves to mentor hackathon teams, blogging and speaking at various developer groups in the field of AI & Cloud. He is also an active speaker, blogger in various Developer Communities such as at AWS User Group India, TensorFlow User Group, Google Developer Group, Tamil FOSS Community.

### Social Handles:
- Email: vivekraja98@gmail.com
- Linkedin: https://linkedin.com/in/Vivek0712
- Twitter: https://twitter.com/VivekRaja007

### Repos:
GitHub: https://github.com/Vivek0712



# Before we get started...


## Pre-requisites
 - Basic Python programming Language
 - Understanding of Machine Learning Workflows
 
## Setup

 - Azure Account with Subscription
 - Create a Machine Learning Resource.
 - Provide a name for the workspace, Container Register
 - Launch the Machine Learning Studio
 - Create Dataset
 - Create Compute Resource
 - Launch a Notebook instance

## Preparing the Environment

 - Retreive all the necessary info 
 - Make sure all imports are done
 - Create Workspace (using SDK or Portal)

In [7]:
 from azureml.core import Workspace

# ws = Workspace.create(name='myworkspace',
#                subscription_id='<azure-subscription-id>',
#                resource_group='myresourcegroup',
#                create_resource_group=True,
#                location='eastus2'
#                )

In [8]:
import json

with open('keys.txt') as f:
        keys = json.load(f)
subscription_id = keys["SUBSCRIPTION_ID"]
resource_group = keys["RESOURCE_GROUP"]
workspace_name = keys["WORKSPACE_NAME"]
workspace_region = keys["WORKSPACE_REGION"]

## check if imports are done
import azureml.core


# Workspace

- An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models.

## Create / Access the Workspace

- Using Constructor
- Using config.json file

In [12]:
from azureml.core import Workspace

try:
    ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)

    ws.write_config()
    
    ws = Workspace.from_config()
    
    print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
    print("Workspace not accessible. Create the workspace")
    ws = Workspace.create(name= workspace_name,
               subscription_id= subscription_id,
               resource_group= resource_group,
               create_resource_group=True,
               location= workspace_region
               )
    
# Fetch and Display the workspace
ws = Workspace.from_config()

#Display the details
#ws.get_details()

Workspace configuration succeeded. Skip the workspace creation steps below


# Compute 

- All ML Experiments requires Compute to execute. 

## Create / Access the Compute Resource

- Using ComputeTarget Class

In [39]:
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException

# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"

# Verify that cluster does not exist already
try:
    cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print("Found existing cpu-cluster")
except ComputeTargetException:
    print("Creating new cpu-cluster")
    
    # Specify the configuration for the new cluster
    compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D2_V2",
                                                           min_nodes=0,
                                                           max_nodes=4)

    # Create the cluster with the specified name and configuration
    cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
    
    # Wait for the cluster to complete, show the output log
    cpu_cluster.wait_for_completion(show_output=True)

Creating new cpu-cluster
Creating......
SucceededProvisioning operation finished, operation "Succeeded"
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


# Experiment

- In Azure Machine Learning, an experiment is a named process, usually the running of a script or a pipeline, that can generate metrics and outputs and be tracked in the Azure Machine Learning workspace.

## Create Experiment

- An experiment can be run multiple times, with different data, code, or settings; and Azure Machine Learning tracks each run, enabling you to view run history and compare results for each run.

## The Experiment Run Context

- When you submit an experiment, you use its run context to initialize and end the experiment run that is tracked in Azure Machine Learning

- You can log, monitor every run in the experiment


In [14]:
from azureml.core import Experiment
import pandas as pd

# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace = ws, name = 'my-experiment')

# Start logging data from the experiment
run = experiment.start_logging()

# All your experiment code goes here!!!


### BLAH! BLAH! BLAH! ML STUFFF

print("Hello ML World!!")


# Complete the experiment
run.complete()

Hello ML World!!


# Data

- Any Machine Learning problems involves working with Data.
- It involves importing the data from the data source
- Registering, Maintaining the dataset in Data Store
- Versioning the dataset


In [19]:
#Check and List the datasets attached to our Workspace
from azureml.core import Dataset

print("\nData Stores:")
# Get the default datastore
default_ds = ws.get_default_datastore()

# Enumerate all datastores, indicating which is the default
for ds_name in ws.datastores:
    print(ds_name, "- Default =", ds_name == default_ds.name)
    
    
print("\nDatasets:")
for dataset_name in list(ws.datasets.keys()):
    dataset = Dataset.get_by_name(ws, dataset_name)
    print("\t", dataset.name, 'version', dataset.version)
    


Data Stores:
azureml_globaldatasets - Default = False
workspacefilestore - Default = False
workspaceblobstore - Default = True

Datasets:
	 Sample: Diabetes version 1


In [25]:
# Using the data

tab_data_set = Dataset.get_by_name(ws, dataset_name)

#Taking first 20 rows and converting it to a Pandas Dataframe
tab_data_set.take(20).to_pandas_dataframe()


Unnamed: 0,AGE,SEX,BMI,BP,S1,S2,S3,S4,S5,S6,Y
0,59,2,32.1,101.0,157,93.2,38.0,4.0,4.8598,87,151
1,48,1,21.6,87.0,183,103.2,70.0,3.0,3.8918,69,75
2,72,2,30.5,93.0,156,93.6,41.0,4.0,4.6728,85,141
3,24,1,25.3,84.0,198,131.4,40.0,5.0,4.8903,89,206
4,50,1,23.0,101.0,192,125.4,52.0,4.0,4.2905,80,135
5,23,1,22.6,89.0,139,64.8,61.0,2.0,4.1897,68,97
6,36,2,22.0,90.0,160,99.6,50.0,3.0,3.9512,82,138
7,66,2,26.2,114.0,255,185.0,56.0,4.55,4.2485,92,63
8,60,2,32.1,83.0,179,119.4,42.0,4.0,4.4773,94,110
9,29,1,30.0,85.0,180,93.4,43.0,4.0,5.3845,88,310


In [24]:
#Upload your own data

# default_ds.upload_files(files=['./data/diabetes.csv'], # Upload the diabetes csv files in /data
#                        target_path='diabetes-data/', # Put it in a folder path in the datastore
#                        overwrite=True, # Replace existing files of the same name
#                        show_progress=True)

In [26]:
# Registering the Dataset with the workspace

try:
    tab_data_set = tab_data_set.register(workspace=ws, 
                                        name='diabetes dataset',
                                        description='diabetes data',
                                        tags = {'format':'CSV'},
                                        create_new_version=True)
except Exception as ex:
    print(ex)


print('Datasets registered')

Datasets registered


In [27]:
print("Datasets:")
for dataset_name in list(ws.datasets.keys()):
    dataset = Dataset.get_by_name(ws, dataset_name)
    print("\t", dataset.name, 'version', dataset.version)

Datasets:
	 diabetes dataset version 1
	 Sample: Diabetes version 1


# Training Your Model

### 1. Create Directories for source files

In [29]:
import os

# Create a folder for the experiment files
experiment_folder = 'ml-series-diabetes-exp'
os.makedirs(experiment_folder, exist_ok=True)
print(experiment_folder, 'folder created')

ml-series-diabetes-exp folder created


### 2. Creating your ML Script

- Parsess the Argument passed to the script
- Starts the run for the experiment
- Does all the ML Training
- Stores the output model

In [90]:
%%writefile $experiment_folder/diabetes_training.py
# Import libraries
import os
import argparse
from azureml.core import Run, Dataset
import pandas as pd
import numpy as np
import joblib
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression, LinearRegression
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve
from azureml.core import Model

# Get the script arguments (regularization rate and training dataset ID)
parser = argparse.ArgumentParser()
parser.add_argument('--regularization', type=float, dest='reg_rate', default=0.01, help='regularization rate')
parser.add_argument("--input-data", type=str, dest='training_dataset_id', help='training dataset')
args = parser.parse_args()

# Set regularization hyperparameter (passed as an argument to the script)
reg = args.reg_rate
inputdata = args.training_dataset_id

# Get the experiment run context
run = Run.get_context()

ws = run.experiment.workspace

# get the input dataset by ID
dataset = Dataset.get_by_id(ws, id=inputdata)

# Get the training dataset
print("Loading Data...")
diabetes = dataset.to_pandas_dataframe()

# Separate features and labels
X, y = diabetes[['AGE','SEX','BMI','BP','S1','S2','S3','S4','S5','S6']].values, diabetes['Y'].values

# Split data into training set and test set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=0)

# Train a logistic regression model
print('Training a logistic regression model with regularization rate of', reg)
run.log('Regularization Rate',  np.float(reg))
model = LinearRegression().fit(X_train,y_train)
# model = LogisticRegression(C=1/reg, solver="liblinear").fit(X_train, y_train)

# calculate accuracy
y_hat = model.predict(X_test)
acc = np.average(y_hat == y_test)
print('Accuracy:', acc)
run.log('Accuracy', np.float(acc))


os.makedirs('outputs', exist_ok=True)
# note file saved in the outputs folder is automatically uploaded into experiment record
joblib.dump(value=model, filename='outputs/diabetes_model.pkl')




run.complete()

Overwriting ml-series-diabetes-exp/diabetes_training.py


### 3. Create Environment from Dependencies file (yml)

In [54]:
%%writefile environment.yml

name: simple_environment
dependencies:
  # The python interpreter version.
  # Currently Azure ML only supports 3.5.2 and later.
- python=3.6.2
- scikit-learn
- pandas
- pip
- pip:
  - azureml-defaults
  - azureml-mlflow

Overwriting environment.yml


### 4. Training the model
- Create the environment from the dependency file
- Creating Configuration for Python Script
- Submitting the Script to the Experiment

In [91]:
from azureml.core import Experiment, ScriptRunConfig, Environment
from azureml.widgets import RunDetails

# Create a Python environment for the experiment (from a .yml file)
env = Environment.from_conda_specification("experiment_env", "environment.yml")


# Create a Python environment for the experiment (pip installing requirements.txt)
# env = Environment.from_pip_requirements("experiment_env", "requirements.txt", pip_version=None)


# Create a script config
script_config = ScriptRunConfig(source_directory=experiment_folder,
                                script='diabetes_training.py',
                                 arguments=['--input-data', tab_data_set.as_named_input('diabetes_data')],
                                  compute_target=cpu_cluster,
                                environment=env) 

# submit the experiment run
experiment_name = 'mlseries-train-diabetes'
experiment = Experiment(workspace=ws, name=experiment_name)
run = experiment.submit(config=script_config)

# Show the running experiment run in the notebook widget
RunDetails(run).show()

# Block until the experiment run has completed
run.wait_for_completion()

_UserRunWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 'â€¦

{'runId': 'mlseries-train-diabetes_1625814351_a2091f7e',
 'target': 'cpu-cluster',
 'status': 'Completed',
 'startTimeUtc': '2021-07-09T07:09:52.372703Z',
 'endTimeUtc': '2021-07-09T07:11:49.835327Z',
 'properties': {'_azureml.ComputeTargetType': 'amlcompute',
  'ContentSnapshotId': 'bbc9b11a-563e-4e07-9c6d-c6f4837a1fac',
  'ProcessInfoFile': 'azureml-logs/process_info.json',
  'ProcessStatusFile': 'azureml-logs/process_status.json',
  'azureml.RuntimeType': ''},
 'inputDatasets': [{'dataset': {'id': 'ffb088b5-f279-40e2-8290-4b640d2bd094'}, 'consumptionDetails': {'type': 'RunInput', 'inputName': 'diabetes_data', 'mechanism': 'Direct'}}, {'dataset': {'id': 'ffb088b5-f279-40e2-8290-4b640d2bd094'}, 'consumptionDetails': {'type': 'Reference'}}],
 'outputDatasets': [],
 'runDefinition': {'script': 'diabetes_training.py',
  'command': '',
  'useAbsolutePath': False,
  'arguments': ['--input-data', 'DatasetConsumptionConfig:diabetes_data'],
  'sourceDirectoryDataStore': None,
  'framework': '

### 5. Retrieving metrics &  Models

In [116]:

# Get logged metrics and files
metrics = run.get_metrics()
for key in metrics.keys():
        print(key, metrics.get(key))
print('\n')
for file in run.get_file_names():
    print(file)
    
#Download model

run. download_file("outputs/diabetes_model.pkl","outputs/diabetes_model.pkl")



    
# Register the model
run.register_model(model_path='outputs/diabetes_model.pkl', model_name='diabetes_model',
                   tags={'Training context':'Script'},
                   properties={'Accuracy': run.get_metrics()['Accuracy']})

# List registered models
for model in Model.list(ws):
    print(model.name, 'version:', model.version)
    for tag_name in model.tags:
        tag = model.tags[tag_name]
        print ('\t',tag_name, ':', tag)
    for prop_name in model.properties:
        prop = model.properties[prop_name]
        print ('\t',prop_name, ':', prop)
    print('\n')  
    


Regularization Rate 0.01
Accuracy 0.0


azureml-logs/55_azureml-execution-tvmps_c13ee216f5aceb62467033b31dd5a117e6510f3f903a8dd15d777d32cdbefd5f_d.txt
azureml-logs/65_job_prep-tvmps_c13ee216f5aceb62467033b31dd5a117e6510f3f903a8dd15d777d32cdbefd5f_d.txt
azureml-logs/70_driver_log.txt
azureml-logs/75_job_post-tvmps_c13ee216f5aceb62467033b31dd5a117e6510f3f903a8dd15d777d32cdbefd5f_d.txt
azureml-logs/process_info.json
azureml-logs/process_status.json
logs/azureml/93_azureml.log
logs/azureml/dataprep/backgroundProcess.log
logs/azureml/dataprep/backgroundProcess_Telemetry.log
logs/azureml/job_prep_azureml.log
logs/azureml/job_release_azureml.log
outputs/diabetes_model.pkl
diabetes_model version: 4
	 Training context : Script
	 Accuracy : 0.0


diabetes_model version: 3
	 Training context : Script
	 Accuracy : 0.0


diabetes_model version: 2
	 Training context : Script
	 Accuracy : 0.007518796992481203


diabetes_model version: 1
	 Training context : Script
	 Accuracy : 0.007518796992481203




# Deploying the Model 

- Create a scoring Script
- Create Inference and Deployment Configuration
- The trained model can be deployed in three ways
    - In local compute
    - In AmlCompute
    - In Azure Kubernetes Service
 

In [93]:
script_file = os.path.join(experiment_folder,"score_diabetes.py")


In [94]:
%%writefile $script_file
import json
import joblib
import numpy as np
from azureml.core.model import Model

# Called when the service is loaded
def init():
    global model
    # Get the path to the deployed model file and load it
    model_path = Model.get_model_path('diabetes_model')
    model = joblib.load(model_path)

# Called when a request is received
def run(raw_data):
    # Get the input data as a numpy array
    data = np.array(json.loads(raw_data)['data'])
    # Get a prediction from the model
    predictions = model.predict(data)
    
    return json.dumps(predictions.tolist())

Overwriting ml-series-diabetes-exp/score_diabetes.py


## - Using local compute

In [110]:
x_new = [[59,2,32.1,101.0,157,93.2,38.0,4.00,4.8598,87],[48,1,21.6,87.0,183,103.2,70.0,3.00,3.8918,69]]
# print ('Patient: {}'.format(x_new[0]))

# Convert the array to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Call the web service, passing the input data (the web service will also accept the data in binary format)
predictions = service.run(input_data = input_json)

# Get the predicted class - it'll be the first (and only) one.
predicted_classes = json.loads(predictions)
for i in range(len(x_new)):
    print ("Patient {}".format(x_new[i]), predicted_classes[i] )

Patient [59, 2, 32.1, 101.0, 157, 93.2, 38.0, 4.0, 4.8598, 87] 151
Patient [48, 1, 21.6, 87.0, 183, 103.2, 70.0, 3.0, 3.8918, 69] 65


## - Using Azure AML Compute

In [96]:
from azureml.core.webservice import AciWebservice
from azureml.core.model import InferenceConfig

# Configure the scoring environment
inference_config = InferenceConfig(entry_script=script_file,
                                   environment=env)

deployment_config = AciWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

service_name = "diabetes-service3"

service = Model.deploy(ws, service_name, [model], inference_config, deployment_config)

service.wait_for_deployment(True)
print(service.state)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-07-09 07:14:40+00:00 Creating Container Registry if not exists.
2021-07-09 07:14:40+00:00 Registering the environment.
2021-07-09 07:14:41+00:00 Use the existing image.
2021-07-09 07:14:42+00:00 Generating deployment configuration.
2021-07-09 07:14:43+00:00 Submitting deployment to compute..
2021-07-09 07:14:46+00:00 Checking the status of deployment diabetes-service3..
2021-07-09 07:17:40+00:00 Checking the status of inference endpoint diabetes-service3.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy


In [114]:
endpoint = service.scoring_uri
print(endpoint)

http://30708b77-602d-4712-8915-31b34943504a.centralus.azurecontainer.io/score


In [115]:
import requests
import json

x_new = [[59,2,32.1,101.0,157,93.2,38.0,4.00,4.8598,87],[48,1,21.6,87.0,183,103.2,70.0,3.00,3.8918,69]]
# Convert the array to a serializable list in a JSON document
input_json = json.dumps({"data": x_new})

# Set the content type
headers = { 'Content-Type':'application/json' }

predictions = requests.post(endpoint, input_json, headers = headers)
print(predictions.json())
predicted_classes = json.loads(predictions.json())

for i in range(len(x_new)):
    print ("Patient {}".format(x_new[i]), predicted_classes[i] )

[151, 65]
Patient [59, 2, 32.1, 101.0, 157, 93.2, 38.0, 4.0, 4.8598, 87] 151
Patient [48, 1, 21.6, 87.0, 183, 103.2, 70.0, 3.0, 3.8918, 69] 65


In [119]:
## - Using AKS Cluster

In [118]:
# from azureml.core.webservice import AksWebservice
# from azureml.core.model import InferenceConfig

# # Configure the scoring environment
# inference_config = InferenceConfig(entry_script=script_file,
#                                    environment=env)
# aks_name = 'deplyakscluster1' 

#     # Verify that cluster does not exist already
# try:
#         aks_target = ComputeTarget(workspace=ws, name=aks_name)
#         print('Found existing cluster, use it.')
# except:
#         # Use the default configuration (can also provide parameters to customize)
#         prov_config = AksCompute.provisioning_configuration(vm_size = "Standard_D5_v2")

#         # Create the cluster
#         aks_target = ComputeTarget.create(workspace = ws, 
#                                         name = aks_name, 
#                                         provisioning_configuration = prov_config)

# if aks_target.get_status() != "Succeeded":
#             aks_target.wait_for_completion(show_output=True)
# deployment_config = AksWebservice.deploy_configuration(cpu_cores = 1, memory_gb = 1)

# service_name = "diabetes-aks"

# service = Model.deploy(ws, service_name, [model], inference_config, deployment_config, aks_target)

# service.wait_for_deployment(True)
# print(service.state)

# endpoint = service.scoring_uri
# print(endpoint)

# x_new = [[59,2,32.1,101.0,157,93.2,38.0,4.00,4.8598,87],[48,1,21.6,87.0,183,103.2,70.0,3.00,3.8918,69]]
# # Convert the array to a serializable list in a JSON document
# input_json = json.dumps({"data": x_new})

# # Set the content type
# headers = { 'Content-Type':'application/json' }

# predictions = requests.post(endpoint, input_json, headers = headers)
# print(predictions.json())
# predicted_classes = json.loads(predictions.json())

# for i in range(len(x_new)):
#     print ("Patient {}".format(x_new[i]), predicted_classes[i] )

# Clean up resources


In [None]:
# Delete the Resource Group to delete all ml related resources in it
# Delete the Model and Webservices

# Improving the ML Workflows

## [ML Notebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml)

- ### [Setting up Pipelines](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/machine-learning-pipelines)
- ### Performing Hyperparameter Tuning
- ### Track and Monitor Models
- ### [Model Explanability](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/explain-model)
- ### [Model Fairness](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/responsible-ai)
- ### [AutoML](https://www.youtube.com/watch?v=rjcXbexIrp0)


# Summary of the notebook

Learn how to use Azure Machine Learning services for experimentation and model management.

As a pre-requisite, run the [configuration Notebook](../configuration.ipynb) notebook first to set up your Azure ML Workspace. Then, run the notebooks in following recommended order.

* [train-within-notebook](./training/train-within-notebook): Train a model while tracking run history, and learn how to deploy the model as web service to Azure Container Instance.
* [train-on-local](./training/train-on-local): Learn how to submit a run to local computer and use Azure ML managed run configuration.
* [train-on-amlcompute](./training/train-on-amlcompute): Use a 1-n node Azure ML managed compute cluster for remote runs on Azure CPU or GPU infrastructure.
* [train-on-remote-vm](./training/train-on-remote-vm): Use Data Science Virtual Machine as a target for remote runs.
* [logging-api](./track-and-monitor-experiments/logging-api): Learn about the details of logging metrics to run history.
* [production-deploy-to-aks](./deployment/production-deploy-to-aks) Deploy a model to production at scale on Azure Kubernetes Service.
* [enable-app-insights-in-production-service](./deployment/enable-app-insights-in-production-service) Learn how to use App Insights with production web service.
 
Find quickstarts, end-to-end tutorials, and how-tos on the [official documentation site for Azure Machine Learning service](https://docs.microsoft.com/en-us/azure/machine-learning/service/).


# Get Certified as Microsoft Azure Data Scientist Associate!

## DP-100: Designing and Implementing a Data Science Solution on Azure

- Candidates for the Azure Data Scientist Associate certification should have subject matter expertise applying data science and machine learning to implement and run machine learning workloads on Azure.
Responsibilities for this role include planning and creating a suitable working environment for data science workloads on Azure. You run data experiments and train predictive models. In addition, you manage, optimize, and deploy machine learning models into production.
A candidate for this certification should have knowledge and experience in data science and using Azure Machine Learning and Azure Databricks.

- More info :  https://docs.microsoft.com/en-us/learn/certifications/exams/dp-100
