# Automated ML

TODO: Import Dependencies. In the cell below, import all the dependencies that you will need to complete the project.

In [1]:
import azureml.core
from azureml.core import Dataset
from azureml.core.experiment import Experiment
from azureml.core.workspace import Workspace
from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
from azureml.data.dataset_factory import TabularDatasetFactory
from azureml.widgets import RunDetails
from azureml.data.datapath import DataPath
from azureml.train.automl import AutoMLConfig
from azureml.interpret import ExplanationClient
from azureml.automl.core.featurization import FeaturizationConfig
import pandas as pd
import logging
from matplotlib import pyplot as plt
import train
import joblib 
import os

## Dataset

### Overview
What is Churn?
Churn is a process in which customers stop or plan to stop using services/contracts of a company. So churn prediction is about identifying customers who are likely to cancel their services/contracts soon. Then companies can offer discounts or other benefits on these services and users can continue with their services.

Naturally, we can use the past data about customers who churned and based on that we will create a model for identifying present customers who are about to go away. This is a binary classification problem. The target variable that we want to predict is categorical and has only two possible outcomes: churn or not churn.


### Task

Some of our customers are churning. They no longer are using our services and going to a different provider. We would like to prevent that from happening. For that, we develop a system for identifying these customers and offer them an incentive to stay. We need to be able to interpret the predictions of the model. 

Firstly we will do some EDA (Exploratory data analysis) in which we identify which features are important in our data and 
then we split the data into train and test so we can test our models and then we deploy our best model.
According to the description, this dataset has the following information:

Services of the customers — phone; multiple lines; internet; tech support and extra services such as online security, backup, device protection, and TV streaming

Account information — how long they have been clients, type of contract, type of payment method

Charges — how much the client was charged in the past month and in total

Demographic information — gender, age, and whether they have dependents or a partner

Churn — yes/no, whether the customer left the company within the past month

The label "status" tells us whether a student was placed or not and this is the target column for predictions.




In [2]:
ws = Workspace.from_config()
print('Workspace name: ' + ws.name, 
      'Azure region: ' + ws.location, 
      'Subscription id: ' + ws.subscription_id, 
      'Resource group: ' + ws.resource_group, sep = '\n')


Workspace name: quick-starts-ws-143004
Azure region: southcentralus
Subscription id: 1b944a9b-fdae-4f97-aeb1-b7eea0beac53
Resource group: aml-quickstarts-143004


In [3]:
# choose a name for experiment
experiment_name = 'ChurnPrediction'

experiment= Experiment(ws, experiment_name)
run = experiment.start_logging()

In [4]:
# Choose a name for your CPU cluster
cpu_cluster_name = "notebook143004"

# Verify that cluster does not exist already
try:
    compute_target = ComputeTarget(workspace=ws, name=cpu_cluster_name)
    print('Found existing cluster, use it.')
except ComputeTargetException:
    compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_DS3_V2',
                                                           max_nodes=6)
    compute_target = ComputeTarget.create(ws, cpu_cluster_name, compute_config)

compute_target.wait_for_completion(show_output=True)

Found existing cluster, use it.

Running


In [5]:

# Loading dataset using url
# NOTE: update the key to match the dataset name
found = False
key = "Churn Prediction Dataset"
description_text = "Churn Prediction for Capstone Project"

if key in ws.datasets.keys(): 
        found = True
        ds = ws.datasets[key] 

if not found:
        # Create Dataset and register it into Workspace
        example_data = 'https://raw.githubusercontent.com/tejasbangera/Udacity-Captstone-Project/main/WA_Fn-UseC_-Telco-Customer-Churn.csv?token=AO3MXNWWA5PXCKIPPCWWT33AO7TLY'
        ds = TabularDatasetFactory.from_delimited_files(path = example_data)        
        #Register Dataset in Workspace
        ds = ds.register(workspace=ws,name=key,description=description_text)

In [6]:
df=ds.to_pandas_dataframe()
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,True,False,1,False,No phone service,DSL,No,...,No,No,No,No,Month-to-month,True,Electronic check,29.85,29.85,False
1,5575-GNVDE,Male,0,False,False,34,True,No,DSL,Yes,...,Yes,No,No,No,One year,False,Mailed check,56.95,1889.5,False
2,3668-QPYBK,Male,0,False,False,2,True,No,DSL,Yes,...,No,No,No,No,Month-to-month,True,Mailed check,53.85,108.15,True
3,7795-CFOCW,Male,0,False,False,45,False,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,False,Bank transfer (automatic),42.3,1840.75,False
4,9237-HQITU,Female,0,False,False,2,True,No,Fiber optic,No,...,No,No,No,No,Month-to-month,True,Electronic check,70.7,151.65,True


In [7]:
from train import clean_data

# Use the clean_data function to clean your data.
x, y = clean_data(ds)

In [8]:
from sklearn.model_selection import train_test_split
#create train test split
train_x, test_x, train_y, test_y = train_test_split(x,y, test_size=0.2, random_state=200)
#join the train_x and train_y to create train dataset
train_df=pd.concat([train_x,train_y], axis=1)

## AutoML Configuration

TODO: Explain why you chose the automl settings and cofiguration you used below.

In [9]:

# TODO: Put your automl settings here
automl_settings = {
    "experiment_timeout_hours" : 0.3,
    "enable_early_stopping" : True,
    "iteration_timeout_minutes": 5,
    "max_concurrent_iterations": 4,
    "primary_metric": 'accuracy',
    "featurization": 'auto',
    "verbosity": logging.INFO,
}

# TODO: Put your automl config here
automl_config = AutoMLConfig(compute_target=compute_target,
                             task = "classification",
                             training_data= ds,
                             label_column_name="Churn",   
                             path = "./project",
                             debug_log = "automl_errors.log",
                             **automl_settings
)

In [10]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config)

Submitting remote run.


Experiment,Id,Type,Status,Details Page,Docs Page
ChurnPrediction,AutoML_d34bf70f-df84-49cd-9910-4ab4cdb52df4,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation


## Run Details

OPTIONAL: Write about the different models trained and their performance. Why do you think some models did better than others?

TODO: In the cell below, use the `RunDetails` widget to show the different experiments.

In [11]:
# TODO: Submit your experiment
remote_run = experiment.submit(automl_config, show_output =True)

Submitting remote run.
Running on remote compute: notebook143004


Experiment,Id,Type,Status,Details Page,Docs Page
ChurnPrediction,AutoML_ebf3bd15-bb5a-4275-86d9-00e01bd94c8e,automl,NotStarted,Link to Azure Machine Learning studio,Link to Documentation



Current status: FeaturesGeneration. Generating features for the dataset.
Current status: DatasetCrossValidationSplit. Generating individually featurized CV splits.
Current status: ModelSelection. Beginning model selection.

****************************************************************************************************
DATA GUARDRAILS: 

TYPE:         Cross validation
STATUS:       DONE
DESCRIPTION:  Each iteration of the trained model was validated through cross-validation.
              
DETAILS:      
+---------------------------------+
|Number of folds                  |
|3                                |
+---------------------------------+

****************************************************************************************************

TYPE:         Class balancing detection
STATUS:       PASSED
DESCRIPTION:  Your inputs were analyzed, and all classes are balanced in your training data.
              Learn more about imbalanced data: https://aka.ms/AutomatedMLImbalanced

In [12]:
RunDetails(remote_run).show()

_AutoMLWidget(widget_settings={'childWidgetDisplay': 'popup', 'send_telemetry': False, 'log_level': 'INFO', 's…

## Best Model

TODO: In the cell below, get the best model from the automl experiments and display all the properties of the model.



In [13]:
from azureml.core.run import Run

# Get the best run object
best_run, fitted_model = remote_run.get_output()
print(best_run)
print(fitted_model)
best_run.get_tags()
print(best_run.properties['model_name'])

Run(Experiment: ChurnPrediction,
Id: AutoML_ebf3bd15-bb5a-4275-86d9-00e01bd94c8e_1,
Type: azureml.scriptrun,
Status: Completed)
Pipeline(memory=None,
         steps=[('datatransformer',
                 DataTransformer(enable_dnn=None, enable_feature_sweeping=None,
                                 feature_sweeping_config=None,
                                 feature_sweeping_timeout=None,
                                 featurization_config=None, force_text_dnn=None,
                                 is_cross_validation=None,
                                 is_onnx_compatible=None, logger=None,
                                 observer=None, task=None, working_dir=None)),
                ('MaxAbsScaler', MaxAbsScaler(copy...
                                   colsample_bylevel=1, colsample_bynode=1,
                                   colsample_bytree=1, gamma=0,
                                   learning_rate=0.1, max_delta_step=0,
                                   max_depth=3, min

In [14]:
#TODO: Save the best model
import joblib
os.makedirs('outputs', exist_ok = True)
joblib.dump(fitted_model, 'outputs/fitted_model.joblib')

## Model Deployment

Remember you have to deploy only one of the two models you trained.. Perform the steps in the rest of this notebook only if you wish to deploy this model.

TODO: In the cell below, register the model, create an inference config and deploy the model as a web service.

In [15]:
best_run, fitted_model = remote_run.get_output()
model_name = best_run.properties['model_name']

script_file_name = './score.py'

best_run.download_file('outputs/scoring_file_v_1_0_0.py', './score.py')

In [16]:
description = 'AutoML Model trained on Churn dataset to predict if a customer has churned or not.'
tags = None
model = remote_run.register_model(model_name = model_name, description = description, tags = tags)

print(remote_run.model_id)

AutoMLebf3bd15b1


TODO: In the cell below, send a request to the web service you deployed to test it.

In [19]:

# Download scoring file 
best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'score.py')

# Download environment file
best_run.download_file('outputs/conda_env_v_1_0_0.yml', 'env.yml')

In [20]:
aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               description='Churn prediction with AutoML')

In [24]:
inference_config = InferenceConfig(entry_script="score.py", environment=best_run.get_environment())

service = Model.deploy(workspace=ws, 
                       name='automl-webservice', 
                       models=[model], 
                       inference_config=inference_config, 
                       deployment_config=aciconfig)


In [26]:
service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-04-18 10:44:50+00:00 Creating Container Registry if not exists.
2021-04-18 10:44:51+00:00 Registering the environment.
2021-04-18 10:44:51+00:00 Use the existing image.
2021-04-18 10:44:51+00:00 Generating deployment configuration.
2021-04-18 10:44:52+00:00 Submitting deployment to compute.
2021-04-18 10:44:55+00:00 Checking the status of deployment automl-webservice..
2021-04-18 10:49:21+00:00 Checking the status of inference endpoint automl-webservice.
Succeeded
ACI service creation operation finished, operation "Succeeded"


"\nenv = best_run.get_environment()    \n\ninference_config = InferenceConfig(entry_script=script,environment=env)\n\ndeploy_config =AciWebservice.deploy_configuration(cpu_cores = 1, \n                                               memory_gb = 1,\n                                               enable_app_insights=True,\n                                               auth_enabled=True,\n                                                 )\n\naci_service_name = 'ChurnPrediction'\nprint(aci_service_name)\naci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\naci_service.wait_for_deployment(True)\nprint(aci_service.state)"

In [27]:
service.wait_for_deployment(show_output=True)
service.update(enable_app_insights = True)
print("State : "+service.state)
print("Key " + service.get_keys()[0])
print("Swagger URI : "+service.swagger_uri)
print("Scoring URI : "+service.scoring_uri)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
SucceededACI service creation operation finished, operation "Succeeded"
State : Healthy


ERROR:azureml.core.webservice.webservice:Received bad response from Model Management Service:
Response Code: 400
Headers: {'Date': 'Sun, 18 Apr 2021 10:49:27 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'x-ms-client-request-id': '3aeec8be-e2d3-4922-a8d6-13caaeda0851', 'x-ms-client-session-id': 'a6df439a-80ef-482d-a151-73194ce097f5', 'api-supported-versions': '1.0, 2018-03-01-preview, 2018-11-19', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'x-request-time': '0.095'}
Content: b'{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"AuthDisabled","message":"Authentication is disabled (authEnabled set to false). Enable service authentication to list/regenerate keys. Subscription: 1b944a9b-fdae-4f97-aeb1-b7eea0beac53, ResourceGroup: aml-quickstarts-143004, Workspace: quick-starts-ws-143004, ACR: /subscriptions/1b944a9b-fdae-4f

WebserviceException: WebserviceException:
	Message: Received bad response from Model Management Service:
Response Code: 400
Headers: {'Date': 'Sun, 18 Apr 2021 10:49:27 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'x-ms-client-request-id': '3aeec8be-e2d3-4922-a8d6-13caaeda0851', 'x-ms-client-session-id': 'a6df439a-80ef-482d-a151-73194ce097f5', 'api-supported-versions': '1.0, 2018-03-01-preview, 2018-11-19', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'x-request-time': '0.095'}
Content: b'{"code":"BadRequest","statusCode":400,"message":"The request is invalid.","details":[{"code":"AuthDisabled","message":"Authentication is disabled (authEnabled set to false). Enable service authentication to list/regenerate keys. Subscription: 1b944a9b-fdae-4f97-aeb1-b7eea0beac53, ResourceGroup: aml-quickstarts-143004, Workspace: quick-starts-ws-143004, ACR: /subscriptions/1b944a9b-fdae-4f97-aeb1-b7eea0beac53/resourceGroups/aml-quickstarts-143004/providers/Microsoft.ContainerRegistry/registries/85e4392aecc5459f90f19f8ecb59a931"}],"correlation":{"RequestId":"3aeec8be-e2d3-4922-a8d6-13caaeda0851"}}'
	InnerException None
	ErrorResponse 
{
    "error": {
        "message": "Received bad response from Model Management Service:\nResponse Code: 400\nHeaders: {'Date': 'Sun, 18 Apr 2021 10:49:27 GMT', 'Content-Type': 'application/json', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'x-ms-client-request-id': '3aeec8be-e2d3-4922-a8d6-13caaeda0851', 'x-ms-client-session-id': 'a6df439a-80ef-482d-a151-73194ce097f5', 'api-supported-versions': '1.0, 2018-03-01-preview, 2018-11-19', 'Strict-Transport-Security': 'max-age=15724800; includeSubDomains; preload', 'X-Content-Type-Options': 'nosniff', 'x-request-time': '0.095'}\nContent: b'{\"code\":\"BadRequest\",\"statusCode\":400,\"message\":\"The request is invalid.\",\"details\":[{\"code\":\"AuthDisabled\",\"message\":\"Authentication is disabled (authEnabled set to false). Enable service authentication to list/regenerate keys. Subscription: 1b944a9b-fdae-4f97-aeb1-b7eea0beac53, ResourceGroup: aml-quickstarts-143004, Workspace: quick-starts-ws-143004, ACR: /subscriptions/1b944a9b-fdae-4f97-aeb1-b7eea0beac53/resourceGroups/aml-quickstarts-143004/providers/Microsoft.ContainerRegistry/registries/85e4392aecc5459f90f19f8ecb59a931\"}],\"correlation\":{\"RequestId\":\"3aeec8be-e2d3-4922-a8d6-13caaeda0851\"}}'"
    }
}

In [30]:
test_df = df.sample(4) # sample data from original dataset
label_df = test_df.pop('Churn')

test_sample = json.dumps({'data': test_df.to_dict(orient='records')})

print(test_sample)

{"data": [{"customerID": "3496-LFSZU", "gender": "Male", "SeniorCitizen": 0, "Partner": true, "Dependents": false, "tenure": 4, "PhoneService": true, "MultipleLines": "No", "InternetService": "Fiber optic", "OnlineSecurity": "No", "OnlineBackup": "No", "DeviceProtection": "No", "TechSupport": "No", "StreamingTV": "No", "StreamingMovies": "No", "Contract": "Month-to-month", "PaperlessBilling": false, "PaymentMethod": "Electronic check", "MonthlyCharges": 70.5, "TotalCharges": 294.2}, {"customerID": "1905-OEILC", "gender": "Female", "SeniorCitizen": 0, "Partner": false, "Dependents": false, "tenure": 1, "PhoneService": true, "MultipleLines": "No", "InternetService": "No", "OnlineSecurity": "No internet service", "OnlineBackup": "No internet service", "DeviceProtection": "No internet service", "TechSupport": "No internet service", "StreamingTV": "No internet service", "StreamingMovies": "No internet service", "Contract": "Month-to-month", "PaperlessBilling": false, "PaymentMethod": "Maile

In [31]:
scoring_uri = service.scoring_uri
input_data = test_sample

# Set the content type
headers = {'Content-Type': 'application/json'}

# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.text)

"{\"result\": [true, false, false, false]}"


In [32]:
print(service.get_logs())

2021-04-18T10:53:54,609537400+00:00 - iot-server/run 
2021-04-18T10:53:54,613204600+00:00 - nginx/run 
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libcrypto.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
/usr/sbin/nginx: /azureml-envs/azureml_8e5a5a51349877e7d47c6a2872e0ebfd/lib/libssl.so.1.0.0: no version information available (required by /usr/sbin/nginx)
2021-04-18T10:53:54,631299800+00:00 - gunicorn/run 
2021-04-18T10:53:54,633077200+00:00 - rsyslog/run 
rsyslogd

TODO: In the cell below, print the logs of the web service and delete the service