<table style="border: none" align="left">
   <tr style="border: none">
      <th style="border: none"><font face="verdana" size="5" color="black"><b>Lab: Build, Save and Deploy Model to IBM Watson Machine Learning (WML)</b></th>
      <th style="border: none"><img src="https://github.com/pmservice/customer-satisfaction-prediction/blob/master/app/static/images/ml_icon_gray.png?raw=true" alt="Watson Machine Learning icon" height="40" width="40"></th>
   </tr>
</table>


This notebook walks you through these steps:
- Build a model
- Save the model in the WML repository
- Create a Deployment in WML
- Invoke the deployed model with a Rest Client to test it

### Step 1: Load Files

In [2]:
import wget

# Customer Information
wget.download('https://raw.githubusercontent.com/nwngeek212/DSX-DemoCenter/master/predictCustomerChurn/data_assets/customer.csv')
customer = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load('customer.csv')
  
#Churn information  
wget.download('https://raw.githubusercontent.com/nwngeek212/DSX-DemoCenter/master/predictCustomerChurn/data_assets/churn.csv')
customer_churn = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").option("inferSchema", "true").load('churn.csv')

### Step 2: Merge Files

In [19]:
merged=customer.join(customer_churn,customer['ID']==customer_churn['ID']).select(customer['*'],customer_churn['CHURN'])

Unnamed: 0,ID,Gender,Status,Children,Est Income,Car Owner,Age,LongDistance,International,Local,Dropped,Paymethod,LocalBilltype,LongDistanceBilltype,Usage,RatePlan,CHURN
0,1,F,S,1,38000.0,N,24.393333,23.56,0,206.08,0,CC,Budget,Intnl_discount,229.64,3,T
1,6,M,M,2,29616.0,N,49.426667,29.78,0,45.5,0,CH,FreeLocal,Standard,75.29,2,F
2,8,M,M,0,19732.8,N,50.673333,24.81,0,22.44,0,CC,FreeLocal,Standard,47.25,3,F
3,11,M,S,2,96.33,N,56.473333,26.13,0,32.88,1,CC,Budget,Standard,59.01,1,F
4,14,F,M,2,52004.8,N,25.14,5.03,0,23.11,0,CH,Budget,Intnl_discount,28.14,1,F


### Step 3: Rename some columns
This step is to remove spaces from columns names

In [20]:
merged = merged.withColumnRenamed("Est Income", "EstIncome").withColumnRenamed("Car Owner","CarOwner")
merged.toPandas().head()

Unnamed: 0,ID,Gender,Status,Children,EstIncome,CarOwner,Age,LongDistance,International,Local,Dropped,Paymethod,LocalBilltype,LongDistanceBilltype,Usage,RatePlan,CHURN
0,1,F,S,1,38000.0,N,24.393333,23.56,0,206.08,0,CC,Budget,Intnl_discount,229.64,3,T
1,6,M,M,2,29616.0,N,49.426667,29.78,0,45.5,0,CH,FreeLocal,Standard,75.29,2,F
2,8,M,M,0,19732.8,N,50.673333,24.81,0,22.44,0,CC,FreeLocal,Standard,47.25,3,F
3,11,M,S,2,96.33,N,56.473333,26.13,0,32.88,1,CC,Budget,Standard,59.01,1,F
4,14,F,M,2,52004.8,N,25.14,5.03,0,23.11,0,CH,Budget,Intnl_discount,28.14,1,F


### Step 4: Build the Spark pipeline and the Random Forest model
"Pipeline" is an API in SparkML that's used for building models.
Additional information on SparkML: https://spark.apache.org/docs/2.0.2/ml-guide.html

In [5]:
from pyspark.ml.feature import StringIndexer, VectorIndexer
from pyspark.ml import Pipeline
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.classification import RandomForestClassifier

# Prepare string variables so that they can be used by the decision tree algorithm
stringIndexer1 = StringIndexer(inputCol='Gender', outputCol='GenderEncoded')
stringIndexer2 = StringIndexer(inputCol='Status',outputCol='StatusEncoded')
stringIndexer3 = StringIndexer(inputCol='CarOwner',outputCol='CarOwnerEncoded')
stringIndexer4 = StringIndexer(inputCol='Paymethod',outputCol='PaymethodEncoded')
stringIndexer5 = StringIndexer(inputCol='LocalBilltype',outputCol='LocalBilltypeEncoded')
stringIndexer6 = StringIndexer(inputCol='LongDistanceBilltype',outputCol='LongDistanceBilltypeEncoded')
stringIndexer7 = StringIndexer(inputCol='CHURN', outputCol='label')

# Pipelines API requires that input variables are passed in  a vector
assembler = VectorAssembler(inputCols=["GenderEncoded", "StatusEncoded", "CarOwnerEncoded", "PaymethodEncoded", "LocalBilltypeEncoded", \
                                       "LongDistanceBilltypeEncoded", "Children", "EstIncome", "Age", "LongDistance", "International", "Local",\
                                      "Dropped","Usage","RatePlan"], outputCol="features")


# instantiate the algorithm, take the default settings
rf=RandomForestClassifier(labelCol="label", featuresCol="features")

#pipeline = Pipeline(stages=[stringIndexer1, stringIndexer2, stringIndexer3, assembler, rf])
pipeline = Pipeline(stages=[stringIndexer1,stringIndexer2,stringIndexer3,stringIndexer4,stringIndexer5,stringIndexer6,stringIndexer7, assembler, rf])

In [6]:
# Split data into train and test datasets
train, test = merged.randomSplit([80.0,20.0], seed=6)

In [7]:
# Build models
model = pipeline.fit(train)

### Step 5: Score the test data set

In [8]:
results = model.transform(test)

### Step 6: Model Evaluation 

In [9]:
print 'Precision model1 = {:.2f}.'.format(results.filter(results.label == results.prediction).count() / float(results.count()))

Precision model1 = 0.92.


In [10]:
from pyspark.ml.evaluation import BinaryClassificationEvaluator

# Evaluate model
evaluator = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="label", metricName="areaUnderROC")
print 'Area under ROC curve = {:.2f}.'.format(evaluator.evaluate(results))

Area under ROC curve = 0.92.


### Step 7: Save Model in WML repository

In this section you will store your model in the Watson Machine Learning (WML) repository by using Python client libraries.
* <a href="https://console.ng.bluemix.net/docs/services/PredictiveModeling/index.html">WML Documentation</a>
* <a href="http://watson-ml-api.mybluemix.net/">WML API</a> 
<br/>

First, you must import client libraries.

In [11]:
from repository.mlrepositoryclient import MLRepositoryClient
from repository.mlrepositoryartifact import MLRepositoryArtifact

Put your authentication information from your instance of the Watson Machine Learning service in <a href="https://console.ng.bluemix.net/dashboard/apps/" target="_blank">Bluemix</a> in the next cell. You can find your information on the **Service Credentials** tab of your service instance in Bluemix.

<span style="color:red">Replace the service_path and credentials with your own information</span>

service_path=[your url]<br/>
username=[your username]<br/>
password=[your password]<br/>

In [13]:
# @hidden_cell
service_path = 'https://ibm-watson-ml.mybluemix.net'
username = 'abbf8acf-16fd-417c-9c92-f79630bde9b5'
password = '47194f45-8f61-4d8b-bb5c-3fac44901f3e'

Authorize the repository client:

In [14]:
ml_repository_client = MLRepositoryClient(service_path)
ml_repository_client.authorize(username, password)

Create the model artifact (abstraction layer).

<b>Tip:</b> The MLRepositoryArtifact method expects a trained model object, training data, and a model name. (It is this model name that is displayed by the Watson Machine Learning service).


In [15]:
model_artifact = MLRepositoryArtifact(model, training_data=train, name="Predict Customer Churn")

Save pipeline and model artifacts to your Watson Machine Learning instance:

In [16]:
saved_model = ml_repository_client.models.save(model_artifact)

In [17]:
# Print the saved model properties
print "modelType: " + saved_model.meta.prop("modelType")
print "creationTime: " + str(saved_model.meta.prop("creationTime"))
print "modelVersionHref: " + saved_model.meta.prop("modelVersionHref")
print "label: " + saved_model.meta.prop("label")

modelType: sparkml-model-2.0
creationTime: 2017-09-15 07:55:05.992000+00:00
modelVersionHref: https://ibm-watson-ml.mybluemix.net/v2/artifacts/models/04b39112-c8f1-4bc7-8cb1-94797a49308d/versions/717cd846-9814-4327-8d65-1850e83686a3
label: CHURN


### Step 8:  Generate Authorization Token for Invoking the model

In [18]:
import urllib3, requests, json

headers = urllib3.util.make_headers(basic_auth='{}:{}'.format(username, password))
url = '{}/v2/identity/token'.format(service_path)
response = requests.get(url, headers=headers)
mltoken = json.loads(response.text).get('token')
print mltoken

eyJhbGciOiJSUzUxMiIsInR5cCI6IkpXVCJ9.eyJ0ZW5hbnRJZCI6ImI1ODg0MzJhLTBhNTMtNDUyMi1hMGM5LWE2ODM2MGMxNmQ3YiIsImluc3RhbmNlSWQiOiJiNTg4NDMyYS0wYTUzLTQ1MjItYTBjOS1hNjgzNjBjMTZkN2IiLCJwbGFuSWQiOiIzZjZhY2Y0My1lZGU4LTQxM2EtYWM2OS1mOGFmM2JiMGNiZmUiLCJyZWdpb24iOiJ1cy1zb3V0aCIsInVzZXJJZCI6ImFiYmY4YWNmLTE2ZmQtNDE3Yy05YzkyLWY3OTYzMGJkZTliNSIsImlzcyI6Imh0dHA6Ly8xMjkuNDEuMjI5LjE4ODo4MDgwL3YyL2lkZW50aXR5IiwiaWF0IjoxNTA1NDYyMTU1LCJleHAiOjE1MDU0OTA5NTV9.r_ferd7gCSZf3cTqJSkUrNlZS-wi5oFhBVuGnwwZ7LmDo8NFoaWesdNmCBfeEqjaJ-eippsxtVh_3OMItVpzXy51eqK5py9gs6nFptVQTNdhyV8aaY0Xoa_aJsUYt0rFp7tkiYfVKccWYi7p0YsLCc6WpPBwn0h7Th_AahRWi5e6SVhFJs9eyK5eJ9bHIaIk9AEAzlWp3nhzqy17FSG35CpyJOB6Sg6DX8TaIO8b3YzVHEoy2ZYCNcjTqkGB1brmN-aCHO5dLgRdBj_gNk9mJOwUgaI3a2mrwQy_CufrXfcp_z6GLyLqk9dwbqJXVpxyyQKkcBbOJLD0vG8Ygf9sWw


#### Step 8.1 Copy the generated token into your notepad

### Step 9:  Go to WML in Bluemix to create a Deployment Endpoint and Test the Deployed model

* In your <a href="https://console.ng.bluemix.net/dashboard/apps/" target="_blank">Bluemix</a> dashboard, click into your WML Service and click the **Launch Dashboard** button under Watson Machine Learing.
![WML Launch Dashboard](https://raw.githubusercontent.com/yfphoon/dsx_demo/master/WML_Launch_Dashboard.png)

<br/>
* You should see your deployed model in the **Models** tab


* Under *Actions*, click on the 3 ellipses and click ***Create Deployment***.  Give your deployment configuration a unique name, e.g. "Predict Customer Churn Deply", accept the defaults and click **Save**.
<br/>
<br/>
* In the *Deployments tab*, under *Actions*, click **View Details**
<br/>
<br/>
* Scoll down to **API Details**, copy the value of the **Scoring Endpoint** into your notepad.

### Step 10:  Invoke the model with a REST Client, e.g. https://client.restlet.com/

In the REST client interface enter the following information:

1. Protocol:  **HTTPS**
<br/>
<br/>

2. URI: **your scoring endpoint**  (Step 10)
<br/>
<br/>
3. method: **POST**
<br/>
<br/>
4. Authorization:  **your generated token** (Step 9). Hint: Add "Basic authorization" with a dummy value of 1 in the userid field. Then replace the value with the token. 
<br/>
<br/>
5. Content Type: **application/JSON**
<br/>
<br/>
6. JSON Body:<br/>**{
  "fields": [
    "ID","Gender","Status","Children","EstIncome","CarOwner","Age","LongDistance","International","Local","Dropped","Paymethod","LocalBilltype","LongDistanceBilltype","Usage","RatePlan"
  ],
  "values": [ 
  [999,"F","M",2.0,77551.100000,"Y",33.600000,20.530000,0.000000,41.890000,1.000000,"CC","Budget","Intnl_discount",62.420000,2.000000]
  ]
} **
<br/>
<br/>
7. Click **Send*

Scroll down to the **RESPONSE** section to see the scored results

**Note:** The values in the JSON body does not include the label.


**Sample REST Client Input**
![Rest Client Input](https://github.com/ibm-cloud-architecture/refarch-data-science/blob/master/static/imgs/RestRequest.PNG?raw=true)

You have come to the end of this notebook


**Sidney Phoon**
<br/>
yfphoon@us.ibm.com
<br/>
April 25, 2017