# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:

import logging
import os
import csv

from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import pkg_resources

import azureml.core
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.train.automl import AutoMLConfig
from azureml.core.dataset import Dataset

from azureml.pipeline.steps import AutoMLStep

# Check core SDK version number
print("SDK version:", azureml.core.VERSION)

SDK version: 1.31.0


## Dataset

### Overview
TODO: In this markdown cell, give an overview of the dataset you are using. Also mention the task you will be performing.


TODO: Get data. In the cell below, write code to access the data you will be using in this project. Remember that the dataset needs to be external.

In [2]:
ws = Workspace.from_config()

# choose a name for experiment
experiment_name = 'automl-telco-churn'

experiment=Experiment(ws, experiment_name)

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [3]:
from azureml.core.compute import AmlCompute
from azureml.core.compute import ComputeTarget
from azureml.core.compute_target import ComputeTargetException


cpu_cluster_name = "cpucluster"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_D2_V2',# for GPU, use "STANDARD_NC6"
                                                           #vm_priority = 'lowpriority', # optional
                                                           max_nodes=4)
    compute_target = ComputeTarget.create(ws, amlcompute_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)


Found existing cluster, use it.
Succeeded
AmlCompute wait for completion finished

Minimum number of nodes requested have been provisioned


In [4]:
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.data.dataset_factory import DataType

#file uploaded to github
url = "https://raw.githubusercontent.com/thomascjw30/ML-Engineer-Capstone-Project/main/WA_Fn-UseC_-Telco-Customer-Churn.csv"

# pass url to Tabular dataset.  Note this is different to pandas dataframe, and gets converted to a dataframe in the function.

dataset = TabularDatasetFactory.from_delimited_files(url,header = True)

#Drop columns that are not needed 'customerID'
dataset = dataset.drop_columns(['customerID'])



In [5]:
ds = dataset.to_pandas_dataframe()
ds.head(5)

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,Female,0,True,False,1,False,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,True,Electronic check,29.85,29.85,False
1,Male,0,False,False,34,True,No,DSL,Yes,No,Yes,No,No,No,One year,False,Mailed check,56.95,1889.5,False
2,Male,0,False,False,2,True,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,True,Mailed check,53.85,108.15,True
3,Male,0,False,False,45,False,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,False,Bank transfer (automatic),42.3,1840.75,False
4,Female,0,False,False,2,True,No,Fiber optic,No,No,No,No,No,No,Month-to-month,True,Electronic check,70.7,151.65,True


In [6]:

# TODO: Put your automl config here
automl_settings = {
    "experiment_timeout_minutes": 30,
    "max_concurrent_iterations": 10,
    "primary_metric" : 'accuracy'
}
automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data=dataset,
                             label_column_name="Churn",   
                             path = './telco_project',
                             enable_early_stopping= True,
                             featurization= 'auto',
                             debug_log = "automl_errors.log",
                             **automl_settings
                            )


In [7]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
automl-telco-churn,AutoML_af8c4cb8-b77e-4b53-a4c9-ad8e598a7ff3,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [8]:
from azureml.widgets import RunDetails
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [9]:
import azureml.train.automl

import azureml.automl.core

remote_run.wait_for_completion()

metrics_output_name = 'metrics_output'
best_model_output_name = 'best_model_output'

best_automl, best_model = remote_run.get_output()

Package:azureml-automl-runtime, training version:1.32.0, current version:1.31.0
Package:azureml-core, training version:1.32.0, current version:1.31.0
Package:azureml-dataset-runtime, training version:1.32.0, current version:1.31.0
Package:azureml-defaults, training version:1.32.0, current version:1.31.0
Package:azureml-interpret, training version:1.32.0, current version:1.31.0
Package:azureml-mlflow, training version:1.32.0, current version:1.31.0
Package:azureml-pipeline-core, training version:1.32.0, current version:1.31.0
Package:azureml-telemetry, training version:1.32.0, current version:1.31.0
Package:azureml-train-automl-client, training version:1.32.0, current version:1.31.0
Package:azureml-train-automl-runtime, training version:1.32.0, current version:1.31.0


In [10]:
import joblib, pickle
joblib.dump(best_model, 'best_run_automl.pkl')

['best_run_automl.pkl']

In [11]:
print(best_model)

Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=False, enable_feature_sweeping=True, feature_sweeping_config={}, feature_sweeping_timeout=86400, featurization_config=None, force_text_dnn=False, is_cross_validation=True, is_onnx_compatible=False, observer=None, task='classification', working_dir='/mnt/batch/tasks/shared/LS_root/mount...
    gpu_training_param_dict={'processing_unit_type': 'cpu'}
), random_state=None, reg_alpha=0.15789473684210525, reg_lambda=0.7894736842105263, subsample=1))], verbose=False))], flatten_transform=None, weights=[0.21428571428571427, 0.14285714285714285, 0.07142857142857142, 0.14285714285714285, 0.2857142857142857, 0.07142857142857142, 0.07142857142857142]))],
         verbose=False)
Y_transformer(['LabelEncoder', LabelEncoder()])


In [12]:
def print_model(fitted_model, prefix=""):
    for step in fitted_model.steps:
        print(prefix + step[0])
        if hasattr(step[1], 'estimators') and hasattr(step[1], 'weights'):
            print({'estimators': list(
                e[0] for e in step[1].estimators), 'weights': step[1].weights})
            print()
            for estimator in step[1].estimators:
                print_model(estimator[1], estimator[0] + ' - ')
        else:
            print(step[1].get_params())
            print()

print_model(best_model)

datatransformer
{'task': 'classification', 'is_onnx_compatible': False, 'enable_feature_sweeping': True, 'enable_dnn': False, 'force_text_dnn': False, 'feature_sweeping_timeout': 86400, 'featurization_config': None, 'is_cross_validation': True, 'feature_sweeping_config': {}, 'observer': None, 'working_dir': '/mnt/batch/tasks/shared/LS_root/mounts/clusters/tcjw301/code/Users/tcjw30'}

prefittedsoftvotingclassifier
{'estimators': ['35', '1', '15', '46', '21', '40', '23'], 'weights': [0.21428571428571427, 0.14285714285714285, 0.07142857142857142, 0.14285714285714285, 0.2857142857142857, 0.07142857142857142, 0.07142857142857142]}

35 - sparsenormalizer
{'norm': 'l1', 'copy': True}

35 - xgboostclassifier
{'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 0.7, 'gamma': 0, 'learning_rate': 0.1, 'max_delta_step': 0, 'max_depth': 2, 'min_child_weight': 1, 'missing': nan, 'n_estimators': 100, 'n_jobs': 1, 'nthread': None, 'objective': 're

### Best Model: Parameters:
#### Algorithmn: VotingClassifier
#### Estimators: ['12', '1', '15', '29', '32', '36', '14', '6', '33']
#### Weights: [0.08333333333333333, 0.08333333333333333, 0.16666666666666666, 0.08333333333333333, 0.08333333333333333, 0.08333333333333333, 0.16666666666666666, 0.16666666666666666, 0.08333333333333333]}

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [23]:
from azureml.core.model import Model
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig
from azureml.core.webservice import LocalWebservice, Webservice, AciWebservice
from azureml.core.conda_dependencies import CondaDependencies
import azureml.train.automl

model = Model.register(workspace = ws, model_name = 'best_run_automl', model_path = 'best_run_automl.pkl')
print(model.name, model.id, model.version, sep='\t')

# Create the environment
env = best_automl.get_environment()
#conda_dep = CondaDependencies()


Registering model best_run_automl
best_run_automl	best_run_automl:9	9


In [24]:
inference_config = InferenceConfig(entry_script='score.py', environment=env)

In [26]:

deployment_config = AciWebservice.deploy_configuration(cpu_cores=1, memory_gb=4, enable_app_insights=True)
service = Model.deploy(ws, "telco-churn-automl", [model], inference_config, deployment_config)
service.wait_for_deployment(show_output = True)

print(service.state)
print(service.scoring_uri)
print(service.swagger_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-07-31 16:40:55+00:00 Creating Container Registry if not exists.
2021-07-31 16:40:55+00:00 Registering the environment.
2021-07-31 16:40:56+00:00 Use the existing image.
2021-07-31 16:40:56+00:00 Generating deployment configuration.
2021-07-31 16:40:57+00:00 Submitting deployment to compute.
2021-07-31 16:41:00+00:00 Checking the status of deployment telco-churn-automl..
2021-07-31 16:42:49+00:00 Checking the status of inference endpoint telco-churn-automl.
Succeeded
ACI service creation operation finished, operation "Succeeded"
Healthy
http://26b748f7-230d-4276-8193-b1763c671b84.eastus2.azurecontainer.io/score
http://26b748f7-230d-4276-8193-b1763c671b84.eastus2.azurecontainer.io/swagger.json


TODO: In the cell below, send a request to the web service you deployed to test it.

In [27]:
print("Scoring URI: " + service.scoring_uri)

Scoring URI: http://26b748f7-230d-4276-8193-b1763c671b84.eastus2.azurecontainer.io/score


In [28]:
import json
data = {"data":
        [
          {
            "gender" :"Female",
            "SeniorCitizen" :0,
            "Partner" :0,
            "Dependents" :0,
            "tenure" :1,
            "PhoneService" :0,
            "MultipleLines" :"No phone service",
            "InternetService" :"DSL",
            "OnlineSecurity" :0,
            "OnlineBackup" :0,
            "DeviceProtection" :0,
            "TechSupport" :0,
            "StreamingTV" :0,
            "StreamingMovies" :0,
            "Contract" :"Month-to-month",
            "PaperlessBilling" :1,
            "PaymentMethod" :"Electronic check",
            "MonthlyCharges" :1500.85,
            "TotalCharges" :1500.85,
          },
            
        {
            "gender" :"Male",
            "SeniorCitizen" :1,
            "Partner" :1,
            "Dependents" :0,
            "tenure" :0,
            "PhoneService" :0,
            "MultipleLines" :"No phone service",
            "InternetService" :"DSL",
            "OnlineSecurity" :0,
            "OnlineBackup" :0,
            "DeviceProtection" :0,
            "TechSupport" :0,
            "StreamingTV" :0,
            "StreamingMovies" :0,
            "Contract" :"Month-to-month",
            "PaperlessBilling" :0,
            "PaymentMethod" :"Electronic check",
            "MonthlyCharges" :20.50,
            "TotalCharges" :20.50,
            
        }
      ]}
test_sample = json.dumps(data)
output= service.run(test_sample)
print(output)

[False, False]


In [29]:
%run endpoint.py

[False, False]


TODO: In the cell below, print the logs of the web service and delete the service

In [30]:
print(service.get_logs())

2021-07-31T16:42:39,477720300+00:00 - rsyslog/run 
2021-07-31T16:42:39,479467900+00:00 - iot-server/run 
2021-07-31T16:42:39,483973900+00:00 - gunicorn/run 
Dynamic Python package installation is disabled.
Starting HTTP server
2021-07-31T16:42:39,504782400+00:00 - nginx/run 
rsyslogd: /azureml-envs/azureml_fc92eee9a5613508afa12283dd0b27d8/lib/libuuid.so.1: no version information available (required by rsyslogd)
EdgeHubConnectionString and IOTEDGE_IOTHUBHOSTNAME are not set. Exiting...
2021-07-31T16:42:39,917575100+00:00 - iot-server/finish 1 0
2021-07-31T16:42:39,920046700+00:00 - Exit code 1 is normal. Not restarting iot-server.
Starting gunicorn 20.1.0
Listening at: http://127.0.0.1:31311 (63)
Using worker: sync
worker timeout is set to 300
Booting worker with pid: 90
SPARK_HOME not set. Skipping PySpark Initialization.
Generating new fontManager, this may take some time...
Initializing logger
2021-07-31 16:42:42,916 | root | INFO | Starting up app insights client
logging socket was 

In [31]:
service.delete()