## Train a Churn Prediction model on YData and deploy it on Azure Container Instance (ACI)

In this tutorial, you train a machine learning model on YData and then deploy it on Azure.

This tutorial trains a simple logistic regression model using the [TelcoChurn](https://www.kaggle.com/bandiatindra/telecom-churn-prediction/data) dataset and [scikit-learn](http://scikit-learn.org) with Azure Machine Learning.

In this tutroial you will learn to:

> * Set up your development environment
> * Access and examine the data
> * Train and pickle a simple logistic regression model using YData
> * Review training results
> * Set up your testing environment on ACI
> * Retrieve the model from your workspace
> * Test the model locally
> * Deploy the model to ACI
> * Test the deployed model

## Set up your development environment

All the setup for your development work can be accomplished in a Python notebook.  Setup includes:

* Importing Python packages
* Connecting to a workspace to enable communication between your local computer and remote resources
* Creating a remote compute target to use for training

### Import packages

Import Python packages you need in this session. Also display the Azure Machine Learning SDK version.

In [1]:
!pip install azureml-sdk scikit-learn==0.22.1 pandas numpy matplotlib

Collecting azureml-sdk
  Using cached azureml_sdk-1.36.0-py3-none-any.whl (4.5 kB)
Collecting scikit-learn==0.22.1
  Using cached scikit_learn-0.22.1-cp37-cp37m-manylinux1_x86_64.whl (7.0 MB)
Collecting azureml-dataset-runtime[fuse]~=1.36.0
  Using cached azureml_dataset_runtime-1.36.0-py3-none-any.whl (3.5 kB)
Collecting azureml-core~=1.36.0
  Using cached azureml_core-1.36.0.post2-py3-none-any.whl (2.4 MB)
Collecting azureml-train-automl-client~=1.36.0
  Using cached azureml_train_automl_client-1.36.0-py3-none-any.whl (135 kB)
Collecting azureml-pipeline~=1.36.0
  Using cached azureml_pipeline-1.36.0-py3-none-any.whl (3.7 kB)
Collecting azureml-train-core~=1.36.0
  Using cached azureml_train_core-1.36.0-py3-none-any.whl (8.6 MB)
Collecting azure-mgmt-keyvault<10.0.0,>=0.40.0
  Using cached azure_mgmt_keyvault-9.3.0-py2.py3-none-any.whl (412 kB)
Collecting azure-mgmt-containerregistry>=2.0.0
  Using cached azure_mgmt_containerregistry-8.2.0-py2.py3-none-any.whl (928 kB)
Collecting doc

### Connect to workspace

Download your Azure Workspace configuration file and upload it in this working directory - It will be used to remotely access your Azure Workspace from YData

Create a workspace object from the existing workspace. `Workspace.from_config()` reads the file **config.json** and loads the details into an object named `ws`.

In [2]:
from azureml.core import Workspace
ws_other_environment = Workspace.from_config(path="./config.json")

In [3]:
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt

import azureml.core
from azureml.core import Workspace

# check core SDK version number
print("Azure ML SDK Version: ", azureml.core.VERSION)

Azure ML SDK Version:  1.36.0


### Load workspace configuration from the config.json file in the current folder

In [4]:
ws = ws_other_environment
print(ws.name, ws.location, ws.resource_group, sep='\t')

azureml-demo	eastus	arunn-demos


### Load data

Load the test data from the **Data Sources** using YData's Connectors

In [7]:
import numpy as np
import argparse
import os
from sklearn.linear_model import LogisticRegression
import joblib

import pandas as pd

from azureml.core import Run

import numpy as np

from ydata.connectors.filetype import FileType
from ydata.connectors.storages.local_connector import LocalConnector

connector = LocalConnector()
files_dirs = connector.list(path='/home/ydata/data')



+---------+----------------+----------------+----------------+
| Package | client         | scheduler      | workers        |
+---------+----------------+----------------+----------------+
| python  | 3.7.11.final.0 | 3.7.10.final.0 | 3.7.10.final.0 |
+---------+----------------+----------------+----------------+


In [9]:
data = connector.read_file('/home/ydata/data/telco_churn.csv', file_type=FileType.CSV).to_pandas()

In [10]:
#inspect the data
data.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes
6,1452-KIOVK,Male,0,No,Yes,22,Yes,Yes,Fiber optic,No,...,No,No,Yes,No,Month-to-month,Yes,Credit card (automatic),89.1,1949.4,No


In [11]:
# Checking the data types of all the columns
data.dtypes

customerID           object
gender               object
SeniorCitizen         int64
Partner              object
Dependents           object
tenure                int64
PhoneService         object
MultipleLines        object
InternetService      object
OnlineSecurity       object
OnlineBackup         object
DeviceProtection     object
TechSupport          object
StreamingTV          object
StreamingMovies      object
Contract             object
PaperlessBilling     object
PaymentMethod        object
MonthlyCharges      float64
TotalCharges         object
Churn                object
dtype: object

In [12]:
# Converting Total Charges to a numerical data type.
data.TotalCharges = pd.to_numeric(data.TotalCharges, errors='coerce')
data.isnull().sum()

customerID           0
gender               0
SeniorCitizen        0
Partner              0
Dependents           0
tenure               0
PhoneService         0
MultipleLines        0
InternetService      0
OnlineSecurity       0
OnlineBackup         0
DeviceProtection     0
TechSupport          0
StreamingTV          0
StreamingMovies      0
Contract             0
PaperlessBilling     0
PaymentMethod        0
MonthlyCharges       0
TotalCharges        11
Churn                0
dtype: int64

In [13]:
# Removing missing values 
data.dropna(inplace = True)

# Remove customer IDs from the data set
df = data.iloc[:,1:]

# Convertin the predictor variable in a binary numeric variable
df['Churn'].replace(to_replace='Yes', value=1, inplace=True)
df['Churn'].replace(to_replace='No',  value=0, inplace=True)

# Let's convert all the categorical variables into dummy variables
df_dummies = pd.get_dummies(df)
df_dummies.head()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges,TotalCharges,Churn,gender_Female,gender_Male,Partner_No,Partner_Yes,Dependents_No,...,StreamingMovies_Yes,Contract_Month-to-month,Contract_One year,Contract_Two year,PaperlessBilling_No,PaperlessBilling_Yes,PaymentMethod_Bank transfer (automatic),PaymentMethod_Credit card (automatic),PaymentMethod_Electronic check,PaymentMethod_Mailed check
0,0,1,29.85,29.85,0,1,0,0,1,1,...,0,1,0,0,0,1,0,0,1,0
2,0,2,53.85,108.15,1,0,1,1,0,1,...,0,1,0,0,0,1,0,0,0,1
3,0,45,42.3,1840.75,0,0,1,1,0,1,...,0,0,1,0,1,0,1,0,0,0
4,0,2,70.7,151.65,1,1,0,1,0,1,...,0,1,0,0,0,1,0,0,1,0
6,0,22,89.1,1949.4,0,0,1,1,0,0,...,0,1,0,0,0,1,0,1,0,0


In [14]:
# We will use the data frame where we had created dummy variables
y = df_dummies['Churn'].values
X = df_dummies.drop(columns = ['Churn'])

# Scaling all the variables to a range of 0 to 1
from sklearn.preprocessing import MinMaxScaler

features = X.columns.values
scaler = MinMaxScaler(feature_range = (0,1))
scaler.fit(X)
X = pd.DataFrame(scaler.transform(X))
X.columns = features

In [84]:
# Create Train & Test Data
from sklearn.model_selection import train_test_split
from sklearn import metrics

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=101)

In [86]:
# Save the test data for later use in testing the deployed models

test_data = X_test.copy()
test_data['Churn'] = y_test
test_data.to_csv('/home/ydata/data/test_data.csv', index=False)

## Train the Logistic Regression Model

In [39]:
print('Train a logistic regression model')
clf = LogisticRegression(C=1.0/0.5, solver="liblinear", multi_class="auto", random_state=42)
clf.fit(X_train, y_train)

print('Predict the test set')
y_hat = clf.predict(X_test.to_numpy())

# calculate accuracy on the prediction
acc = np.average(y_hat == y_test)
print('Accuracy is', acc)

Train a logistic regression model
Predict the test set
Accuracy is 0.7974413646055437


### Package (pickle) and save the model

In [None]:
joblib.dump(value=clf, filename='./sklearn_churn_model.pkl')

### Register the model on Azure

In [None]:
from azureml.core import Workspace
ws = ws_other_environment

from azureml.core.model import Model

model_name = "sklearn_churn"
model = Model.register(model_path="sklearn_churn_model.pkl",
                        model_name=model_name,
                        tags={"data": "telco", "model": "classification"},
                        description="Telco Customer Churn Prediction",
                        workspace=ws)

### Prepare the Deployment Environment on Azure

In [43]:
from azureml.core.environment import Environment
from azureml.core.conda_dependencies import CondaDependencies

# to install required packages
env = Environment('tutorial-env')
cd = CondaDependencies.create(pip_packages=['azureml-dataset-runtime[pandas,fuse]', 'azureml-defaults'], conda_packages = ['scikit-learn==0.22.1'])

env.python.conda_dependencies = cd

# Register environment to re-use later
env.register(workspace = ws)

{
    "databricks": {
        "eggLibraries": [],
        "jarLibraries": [],
        "mavenLibraries": [],
        "pypiLibraries": [],
        "rcranLibraries": []
    },
    "docker": {
        "arguments": [],
        "baseDockerfile": null,
        "baseImage": "mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:20211029.v1",
        "baseImageRegistry": {
            "address": null,
            "password": null,
            "registryIdentity": null,
            "username": null
        },
        "enabled": false,
        "platform": {
            "architecture": "amd64",
            "os": "Linux"
        },
        "sharedVolumes": true,
        "shmSize": null
    },
    "environmentVariables": {
        "EXAMPLE_ENV_VAR": "EXAMPLE_VALUE"
    },
    "inferencingStackVersion": null,
    "name": "tutorial-env",
    "python": {
        "baseCondaEnvironment": null,
        "condaDependencies": {
            "channels": [
                "anaconda",
                "conda-forge"
 

## Deploy as web service

Deploy the model as a web service hosted in ACI. 

To build the correct environment for ACI, provide the following:
* A scoring script to show how to use the model
* A configuration file to build the ACI
* The model you trained before

### Create scoring script

Create the scoring script, called score.py, used by the web service call to show how to use the model.
Note that this script will be run directly on ACI, and not here on YData

You must include two required functions into the scoring script:
* The `init()` function, which typically loads the model into a global object on Azure. This function is run only once when the Docker container is started. 

* The `run(input_data)` function uses the model to predict a value based on the input data. Inputs and outputs to the run typically use JSON for serialization and de-serialization, but other formats are supported. This is the function that will be called for each test run


In [52]:
%%writefile score.py
import json
import numpy as np
import os
import pickle
import joblib

def init():
    global model
    # AZUREML_MODEL_DIR is an environment variable created during deployment.
    # It is the path to the model folder (./azureml-models/$MODEL_NAME/$VERSION)
    # For multiple models, it points to the folder containing all deployed models (./azureml-models)
    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'sklearn_churn_model.pkl')
    model = joblib.load(model_path)

def run(raw_data):
    data = np.array(json.loads(raw_data)['data'])
    # make prediction
    y_hat = model.predict(data)
    # you can return any data type as long as it is JSON-serializable
    return y_hat.tolist()

Overwriting score.py


### Create configuration file

Create a deployment configuration file and specify the number of CPUs and gigabyte of RAM needed for your ACI container. While it depends on your model, the default of 1 core and 1 gigabyte of RAM is usually sufficient for many models. If you feel you need more later, you would have to recreate the image and redeploy the service.

In [53]:
from azureml.core.webservice import AciWebservice

aciconfig = AciWebservice.deploy_configuration(cpu_cores=1, 
                                               memory_gb=1, 
                                               tags={"data": "telco", "model": "classification"},
                                               description="Telco Customer Churn Prediction")

### Deploy in ACI
Estimated time to complete: **about 2-5 minutes**

Configure the image and deploy. The following code goes through these steps:

1. Create environment object containing dependencies needed by the model using the environment file (`myenv.yml`)
1. Create inference configuration necessary to deploy the model as a web service using:
   * The scoring file (`score.py`)
   * envrionment object created in previous step
1. Deploy the model to the ACI container.
1. Get the web service HTTP endpoint.

In [54]:
%%time
import uuid
from azureml.core.webservice import Webservice
from azureml.core.model import InferenceConfig
from azureml.core.environment import Environment
from azureml.core import Workspace
from azureml.core.model import Model

ws = Workspace.from_config()
model = Model(ws, 'sklearn_churn')


myenv = Environment.get(workspace=ws, name="tutorial-env", version="1")
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)

service_name = 'sklearn-churn-svc-' + str(uuid.uuid4())[:4]
service = Model.deploy(workspace=ws, 
                       name=service_name, 
                       models=[model], 
                       inference_config=inference_config, 
                       deployment_config=aciconfig)

service.wait_for_deployment(show_output=True)

Tips: You can try get_logs(): https://aka.ms/debugimage#dockerlog or local deployment: https://aka.ms/debugimage#debug-locally to debug if deployment takes longer than 10 minutes.
Running
2021-12-13 17:05:10+00:00 Creating Container Registry if not exists.
2021-12-13 17:05:10+00:00 Registering the environment.
2021-12-13 17:05:12+00:00 Use the existing image.
2021-12-13 17:05:12+00:00 Generating deployment configuration.
2021-12-13 17:05:13+00:00 Submitting deployment to compute.
2021-12-13 17:05:18+00:00 Checking the status of deployment sklearn-churn-svc-2369..
2021-12-13 17:07:54+00:00 Checking the status of inference endpoint sklearn-churn-svc-2369.
Succeeded
ACI service creation operation finished, operation "Succeeded"
CPU times: user 927 ms, sys: 72.2 ms, total: 999 ms
Wall time: 2min 53s


Get the scoring web service's HTTP endpoint, which accepts REST client calls. This endpoint can be shared with anyone who wants to test the web service or integrate it into an application.

In [55]:
print(service.scoring_uri)

http://cb0a0c8b-38db-4bf8-a430-38cd1d33ea1f.eastus.azurecontainer.io/score


## Test the model


### Load test data

Since we have the data in the X_test and y_test variables we can skip this step.
If not, we load the test data using the connectors.

In [104]:
test_data = connector.read_file('/home/ydata/data/test_data.csv', file_type=FileType.CSV).to_pandas()

In [105]:
y_test = test_data['Churn'].values
X_test = test_data.drop(columns = ['Churn'])

### Predict test data

Feed the test dataset to the model to get predictions.


The following code goes through these steps:
1. Send the data as a Pandas Dataframe to the web service hosted in ACI. 

1. Use the SDK's `run` API to invoke the service. You can also make raw calls using any HTTP tool such as curl.

In [94]:
import json
test = json.dumps({"data": X_test.to_numpy().tolist()})
test = bytes(test, encoding='utf8')
y_hat = service.run(input_data=test)

###  Examine the confusion matrix

Generate a confusion matrix to see how many samples from the test set are classified correctly. Notice the mis-classified value for the incorrect predictions.

In [95]:
from sklearn.metrics import confusion_matrix

conf_mx = confusion_matrix(y_test, y_hat)
print(conf_mx)
print('Overall accuracy:', np.average(y_hat == y_test))

[[904 124]
 [161 218]]
Overall accuracy: 0.7974413646055437


## Make predictions through HTTP requests

Test the deployed model with a random sample of customers from the test data. Since the model accuracy is high, you might have to run the following code a few times before you can see a misclassified sample. You can also send raw HTTP request to test the web service.

In [106]:
import requests

# send a random row from the test set to score
random_index = np.random.randint(0, len(X_test)-1)
input_data = "{\"data\": [" + str(list(X_test.to_numpy()[random_index])) + "]}"

headers = {'Content-Type':'application/json'}

# for AKS deployment you'd need to the service key in the header as well
# api_key = service.get_key()
# headers = {'Content-Type':'application/json',  'Authorization':('Bearer '+ api_key)} 

resp = requests.post(service.scoring_uri, input_data, headers=headers)

print("POST to url", service.scoring_uri)
#print("input data:", input_data)

print("Response: ", resp.content)
print("\nCustomer Index:", random_index)
print("label:", y_test[random_index])
print("prediction:", resp.text)

POST to url http://cb0a0c8b-38db-4bf8-a430-38cd1d33ea1f.eastus.azurecontainer.io/score
Response:  b'[0]'

Customer Index: 309
label: 0
prediction: [0]


## Clean up resources

To keep the resource group and workspace for other tutorials and exploration, you can delete only the ACI deployment using this API call:

In [108]:
service.delete()

No service with name sklearn-churn-svc-2369 found to delete.
