# Watson OpenScale Mortgage Default Lab

Part one of a hands-on lab for IBM Watson OpenScale, this notebook should be run in a [Watson Studio](https://dataplatform.ibm.com/) project with Python 3.6 or greater. It requires a free lite version of [Watson Machine Learning](https://cloud.ibm.com/catalog/services/machine-learning).

This notebook will train, save and deploy a machine learning model to predict mortgage defaults.

## Provision services and create credentials

You will need credentials for Watson Machine Learning. If you already have a WML instance, you may use credentials for it. To provision a new Lite instance of WML, use the [Cloud catalog](https://cloud.ibm.com/catalog/services/machine-learning), give your service a name, and click **Create**. Once your instance is created, click the **Service Credentials** link on the left side of the screen. Click the **New credential** button, give your credentials a name, and click **Add**. Your new credentials can be accessed by clicking the **View credentials** button. Copy and paste your WML credentials into the cell below.

In [1]:
WML_CREDENTIALS = {
  "apikey": "ClPV2HAgLhmtNxM7fSpJoVcD-4dEDnHqzdVgAK8uqWgv",
  "iam_apikey_description": "Auto-generated for key d0a06315-c1c1-4e1d-9431-eefdb8b06026",
  "iam_apikey_name": "Service credentials-cpat-f2f-wml",
  "iam_role_crn": "crn:v1:bluemix:public:iam::::serviceRole:Writer",
  "iam_serviceid_crn": "crn:v1:bluemix:public:iam-identity::a/82cf08a3d55d4f8fa8266f348d7f4362::serviceid:ServiceId-092953e7-cef8-405a-a3a5-123d5107186d",
  "instance_id": "e88e90be-acaa-4350-954a-6d27a9e785bd",
  "url": "https://us-south.ml.cloud.ibm.com"
}

## Name your model

You may give your model and deployment a custom name below; however, if you change the values below, be sure to use the same names in all subsequent notebooks in this lab.

In [2]:
MODEL_NAME = 'Mortgage Default'
DEPLOYMENT_NAME = 'Mortgage Default - Production'

## Run the notebook

At this point, you can run all cells in this notebook using the menus above.

Import the scikit-learn framework and check the version. This notebook was developed using sklearn version 0.20.3.

In [3]:
import sklearn
sklearn.__version__

'0.20.3'

Use the provided credentials above to create a new Watson Machine Learning client.

In [4]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

client = WatsonMachineLearningAPIClient(WML_CREDENTIALS)

List all models for this instance of Watson Machine Learning.

In [5]:
client.repository.list_models()

------------------------------------  --------------------------------------------------  ------------------------  -----------------
GUID                                  NAME                                                CREATED                   FRAMEWORK
d9942549-ce2e-4b82-a492-bedeab926c2e  Customer Churn From Notebook - ychunhui demo model  2020-03-14T16:05:16.646Z  scikit-learn-0.20
1c72c233-186a-4252-aedd-9f8dbd4dab11  GermanCreditRiskModel                               2019-11-15T19:07:37.772Z  mllib-2.3
f50ac62a-2ab5-4c4b-bbf7-814588e829c8  Cpat-Churn                                          2019-11-14T22:22:11.287Z  spss-modeler-18.1
ac292698-6339-41f2-9d8f-cac5df6fa804  CHURN-SPSS                                          2019-11-14T22:12:43.506Z  spss-modeler-18.1
------------------------------------  --------------------------------------------------  ------------------------  -----------------


Import the pandas library, download and examine our training data. The data contains an 'ID' field for the loan ID, which will not be used in training the model and is dropped.

In [6]:
import pandas as pd

url = 'https://raw.githubusercontent.com/emartensibm/mortgage-default/master/Mortgage_Full_Records.csv'
df_raw = pd.read_csv(url)
df = df_raw.drop('ID', axis=1)
df.head()

Unnamed: 0,Income,AppliedOnline,Residence,Yrs_at_Current_Address,Yrs_with_Current_Employer,Number_of_Cards,Creditcard_Debt,Loans,Loan_Amount,SalePrice,Location,MortgageDefault
0,45081,YES,Owner Occupier,14,15,2,713,1,8430,140000,L110,NO
1,46645,YES,Owner Occupier,19,4,1,884,0,6045,475000,L110,NO
2,44202,YES,Owner Occupier,1,23,2,2611,0,12915,162000,L101,NO
3,52495,YES,Owner Occupier,18,16,2,2527,1,10375,195000,L100,YES
4,43608,YES,Owner Occupier,2,20,1,452,0,7610,410000,L100,YES


Import the sklearn libraries we need, including encoders, transformers, scalers, and our random forest classifier.

In [7]:
from sklearn.preprocessing import OneHotEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.impute import SimpleImputer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

Identify the categorical features, and create a one-hot encoder pipeline for them.

Next, identify the numerical features and use the min-max scaler to scale the values, which will significantly increase our model's accuracy.

Finally, organize the categorical encoder and the scaler into a pipeline so the deployed model can work with our data.

In [8]:
categorical_features = ['AppliedOnline','Residence','Location']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

scaled_features = ['Income','Yrs_at_Current_Address','Yrs_with_Current_Employer',\
                   'Number_of_Cards','Creditcard_Debt','Loan_Amount','SalePrice']
scale_transformer = Pipeline(steps=[('scale', MinMaxScaler())])

preprocessor = ColumnTransformer(
    transformers=[
        ('cat', categorical_transformer, categorical_features),
        ('scaler', scale_transformer, scaled_features)
    ]
)

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', RandomForestClassifier())])

Perform the train/test split, train the model, and score the model quality.

In [9]:
X = df.drop('MortgageDefault', axis=1)
y = df['MortgageDefault']

X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)

model = clf.fit(X_train, y_train)
res_predict = model.predict(X_test)
print("model score: %.3f" % clf.score(X_test, y_test))
print(classification_report(y_test, res_predict, target_names=["False", "True"]))

model score: 0.821
              precision    recall  f1-score   support

       False       0.89      0.80      0.85        51
        True       0.74      0.85      0.79        33

   micro avg       0.82      0.82      0.82        84
   macro avg       0.81      0.83      0.82        84
weighted avg       0.83      0.82      0.82        84



## Save the model to Watson Machine Learning

Check the list of models in the WML instance, and remove pre-existing versions of this model. This allows the notebook to be re-run to reset all data if necessary.

In [10]:
model_deployment_ids = client.deployments.get_uids()
for deployment_id in model_deployment_ids:
    deployment = client.deployments.get_details(deployment_id)
    model_id = deployment['entity']['deployable_asset']['guid']
    if deployment['entity']['name'] == DEPLOYMENT_NAME:
        print('Deleting deployment id', deployment_id)
        client.deployments.delete(deployment_id)
        print('Deleting model id', model_id)
        client.repository.delete(model_id)
client.repository.list_models()

------------------------------------  --------------------------------------------------  ------------------------  -----------------
GUID                                  NAME                                                CREATED                   FRAMEWORK
d9942549-ce2e-4b82-a492-bedeab926c2e  Customer Churn From Notebook - ychunhui demo model  2020-03-14T16:05:16.646Z  scikit-learn-0.20
1c72c233-186a-4252-aedd-9f8dbd4dab11  GermanCreditRiskModel                               2019-11-15T19:07:37.772Z  mllib-2.3
f50ac62a-2ab5-4c4b-bbf7-814588e829c8  Cpat-Churn                                          2019-11-14T22:22:11.287Z  spss-modeler-18.1
ac292698-6339-41f2-9d8f-cac5df6fa804  CHURN-SPSS                                          2019-11-14T22:12:43.506Z  spss-modeler-18.1
------------------------------------  --------------------------------------------------  ------------------------  -----------------


Create the metadata and save the model.

In [11]:
metadata = {
    client.repository.ModelMetaNames.NAME: MODEL_NAME,
    client.repository.ModelMetaNames.EVALUATION_METHOD: "binary",
    client.repository.ModelMetaNames.EVALUATION_METRICS: [
        {
            "name": "areaUnderROC",
            "value": 0.7,
            "threshold": 0.7
        }
    ]
}

# Name the columns
cols=["Income","AppliedOnline","Residence","Yrs_at_Current_Address","Yrs_with_Current_Employer",\
      "Number_of_Cards","Creditcard_Debt","Loans","Loan_Amount","SalePrice","Location"]
      
saved_model = client.repository.store_model(model=model, meta_props=metadata, 
                                            training_data=X_train, training_target=y_train, 
                                            feature_names=cols, label_column_names=["MortgageDefault"] )
saved_model

{'metadata': {'guid': 'b57f0bbd-c9f0-4145-b5de-b46097b44186',
  'url': 'https://us-south.ml.cloud.ibm.com/v3/wml_instances/e88e90be-acaa-4350-954a-6d27a9e785bd/published_models/b57f0bbd-c9f0-4145-b5de-b46097b44186',
  'created_at': '2020-03-17T01:51:41.611Z',
  'modified_at': '2020-03-17T01:51:41.674Z'},
 'entity': {'runtime_environment': 'python-3.6',
  'learning_configuration_url': 'https://us-south.ml.cloud.ibm.com/v3/wml_instances/e88e90be-acaa-4350-954a-6d27a9e785bd/published_models/b57f0bbd-c9f0-4145-b5de-b46097b44186/learning_configuration',
  'name': 'Mortgage Default',
  'label_col': 'MortgageDefault',
  'learning_iterations_url': 'https://us-south.ml.cloud.ibm.com/v3/wml_instances/e88e90be-acaa-4350-954a-6d27a9e785bd/published_models/b57f0bbd-c9f0-4145-b5de-b46097b44186/learning_iterations',
  'training_data_schema': {'features': {'type': 'DataFrame',
    'fields': [{'name': 'Income', 'type': 'int64'},
     {'name': 'AppliedOnline', 'type': 'object'},
     {'name': 'Residence

Get the unique ID for the model so we can deploy it.

In [12]:
model_uid = saved_model['metadata']['guid']
model_uid

'b57f0bbd-c9f0-4145-b5de-b46097b44186'

Deploy the model as a web service with Watson Machine Learning.

In [13]:
print("Deploying model...")

deployment = client.deployments.create(artifact_uid=model_uid, name=DEPLOYMENT_NAME, asynchronous=False)

Deploying model...


#######################################################################################

Synchronous deployment creation for uid: 'b57f0bbd-c9f0-4145-b5de-b46097b44186' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='fde50366-9f85-4357-8241-d3f8b0e0034a'
------------------------------------------------------------------------------------------------




In [14]:
deployment_uid = client.deployments.get_uid(deployment)

print("Model id: {}".format(model_uid))
print("Deployment id: {}".format(deployment_uid))

Model id: b57f0bbd-c9f0-4145-b5de-b46097b44186
Deployment id: fde50366-9f85-4357-8241-d3f8b0e0034a


## Congratulations!

If all cells have run successfully, you have successfully deployed the mortgage default model as a web service in Watson Machine Learning. You can proceed with the rest of the lab.