# Create and Test Scoring Pipeline and Deploy R Shiny Dashboard App

### Introduction

Now that we have built the machine learning models, stored and deployed them using [ibm-watson-machine-learning](http://ibm-wml-api-pyclient.mybluemix.net) , we can use the models to score new data. 
In the first part of the notebook we will:

* Programmatically get the ID's for the deployment space and model deployments that were created in the `2-model-training` notebook.
* Promote assets required for scoring new data into the deployment space.
* Create a deployable function which will take raw data for scoring, prep it into the format required for the models and score it.
* Deploy the function.
* Create the required payload, invoke the deployed function and return predictions.

In the second part we will:
* Store Shiny assets into the same deployment space.
* Deploy Shiny assets as an app and view the dashboard.

In [1]:
import pandas as pd
import datetime
from project_lib import Project
project = Project()

from ibm_watson_machine_learning import APIClient
import os
import random
import string
token = os.environ['USER_ACCESS_TOKEN']

wml_credentials = {
   "token": token,
   "instance_id" : "openshift",
   "url": os.environ['RUNTIME_ENV_APSX_URL'],
   "version": "4.6"
}

client = APIClient(wml_credentials)

### User Inputs

Enter the path to the csv file with raw data to be scored and a list of products to be predicted for.

In [2]:
# specify the location of the csv file with raw data that we would like to score for
dataset_loc = '/project_data/data_asset/customerProductHistory.csv'

dataset_name = os.path.basename(dataset_loc)

product_ids = ['EDUCATION', 'CASH', 'BROKERAGE', 'FINANCIALPLAN', 'RETIREMENTPLAN']

### Set up Deployment Space, Deployments and Assets

The following code programmatically gets the deployment space and the model deployment details which were created in **2-model-training**. 
We use the space name and deployment names that were used when creating the deployments as specified below. 
If multiple deployments within the selected space have the same name, the most recently created deployment is used. 

Alternatively, the user can manually enter the space and deployment id's.

The code also promotes some assets into the deployment space, specifically, the dataset with raw data for scoring, the python script file which is used for prepping the data and the metadata that was stored when prepping the data. By promoting these assets into the deployment space, they are available and can be accessed by the deployed function. 

In [3]:
space_name = 'Customer Offer Affinity Space'

# loop through each product name to get the model and deployment names
dict_model = {}
dict_deployment = {}
for product_id in product_ids: 
    dict_model[product_id] = 'offer_affinity_' + product_id + '_model'
    dict_deployment[product_id] = 'offer_affinity_' + product_id + '_model_deployment'

Get the space we are working in, which is found using the name that were hardcoded in **2-model-training**. 
If the user would like to use a different space manually set the **space_id**.

Set the space as the default space for working.

In [4]:
l_space_details = []
l_space_details_created_times = []
for space_details in client.spaces.get_details()['resources']:
    if space_details['entity']['name'] == space_name:
        space_id=space_details['metadata']['id']

# set this space as default space
client.set.default_space(space_id)

'SUCCESS'

Get the deployment ids of all the models deployed in **2-model-training** . If there are multiple deployments with the same name in the same space, we take the latest.

In [5]:
model_deployments_dict = {}
for product_id, deployment_name in dict_deployment.items():
    # get the id of the deployments - 
    # if there are multiple deployments with the same name in the same space, we take the latest
    l_deployment_details = []
    l_deployment_details_created_times = []
                
    for deployment in client.deployments.get_details()['resources']:
        if deployment['entity']['name'] == deployment_name:            
                l_deployment_details.append(deployment)
                l_deployment_details_created_times.append(datetime.datetime.strptime(deployment['metadata']['created_at'],  '%Y-%m-%dT%H:%M:%S.%fZ'))

    # get the index of the latest created date from the list and use that to get the deployment_id
    list_latest_index = l_deployment_details_created_times.index(max(l_deployment_details_created_times))
    deployment_id = l_deployment_details[list_latest_index]['metadata']['id']
    model_deployments_dict[product_id] = deployment_id

Promote the assets into the deployment space. We will use the prep script for getting the raw data into the format required for scoring. We also need the prep metadata that was saved as json during the prep for training, this ensures that the user inputs specified for prepping the data for training are the same used for scoring. We add these assets into the deployment space. Also store the raw data dataset in the deployment space.

In [6]:
asset_details_json = client.data_assets.create('training_user_inputs_and_prepped_column_names_and_means.json', file_path='/project_data/data_asset/training_user_inputs_and_prepped_column_names_and_means.json')
asset_details_script = client.data_assets.create('offer_affinity_prep.py', file_path='/project_data/data_asset/offer_affinity_prep.py')
asset_details_dataset = client.data_assets.create(dataset_name, file_path=dataset_loc)

Creating data asset...
SUCCESS
Creating data asset...
SUCCESS
Creating data asset...
SUCCESS


### Create the Deployable Function
Functions can be deployed in Watson Machine Learning in the same way models can be deployed. The python client or REST API can be used to send data to the deployed function. Using the deployed function allows us to prepare the data and pass it to the model for scoring all within the deployed function.

We start off by creating the dictionary of default parameters to be passed to the function. We get the ID's of all assets that have been promoted into the deployment space. We also add the model deployment ID and space ID information into the dictionary.

In [7]:
# get the assets that were stored in the space - in this version of the package we need to manually assign the id
metadata_id = asset_details_json['metadata']['guid']
prep_id = asset_details_script['metadata']['guid']
dataset_id = asset_details_dataset['metadata']['guid']

In [8]:
params = {'space_id' : space_id}
assets_dict = {'dataset_asset_id' : dataset_id, 'metadata_asset_id' : metadata_id, 
                   'prep_script_asset_id' : prep_id, 'dataset_name' : dataset_name}

In [9]:
# create the wml_credentials again. After already creating the client using the credentials, the instance_id gets updated to 999
# re-create the dictionary so that the correct instance_id is used

wml_credentials["instance_id"] = "openshift"
ai_parms = {'wml_credentials' : wml_credentials,'space_id' : space_id, 'assets' : assets_dict, 'model_deployment_id' : model_deployments_dict}

#### Scoring Pipeline Function

The function below takes new customers to be scored as a payload. It preps the customer raw data, loads the models, executes the model scoring and generates the predictions for each product. 

The following rules are required to make a valid deployable function:

* The deployable function must include a nested function named "score".
* The score function accepts a list.
* The list must include an array with the name "values".
* The score function must return an array with the name "predictions", with a list as the value, which in turn contains an array with the name "values". Example: ```{"predictions" : [{'values' : }]}```
* We pass default parameters into the function, credentials and space detail, details of the assets that were promoted into the space and also the model deployment guid. 
* The assets are downloaded into the deployment space and imported as variables. The raw data to be scored is then prepared and the function calls the model deployment endpoint to score and return predictions. 

In [10]:
def scoring_pipeline(parms=ai_parms):
     
    import pandas as pd
    import requests
    import os
    import json
    
    from ibm_watson_machine_learning import APIClient
    client = APIClient(parms["wml_credentials"])
    client.set.default_space(parms['space_id'])
    

    # call the function to download the stored dataset asset and return the path
    dataset_path = client.data_assets.download(parms['assets']['dataset_asset_id'], parms['assets']['dataset_name'])
    df_raw = pd.read_csv(dataset_path, infer_datetime_format=True, 
                             parse_dates=['CUSTOMER_EFFECTIVE_DATE', 'CUSTOMER_RELATIONSHIP_START_DATE', 
                                              'CUSTOMER_SUMMARY_END_DATE'])
    


    # call the function to download the prep script and return the path
    prep_script_path = client.data_assets.download(parms['assets']['prep_script_asset_id'], 'prep_data_script.py')
    # remove the rest of path and .py at end of file name to get the name of the script for importing
    script_name = os.path.basename(prep_script_path).replace('.py', '')
    
    
            # call the function to download the prep metadata and return the path
    metadata_path = client.data_assets.download(parms['assets']['metadata_asset_id'], 'user_inputs.json')
    def prep(cust_id, scoring_date):
        import requests
        import os
        # import the prep script that we downloaded into the deployment space
        prep_data_script = __import__(script_name)
    

        with open(metadata_path, 'r') as f:
            metadata = json.load(f)
        
        globals().update(metadata)
        
        input_df = df_raw[df_raw[customer_id_col] == cust_id]
        
        scoring_prep = prep_data_script.OfferAffinityPrep('score', product_id=product_list, effective_date_earliest=effective_date_earliest,
                                            effective_date_latest=effective_date_latest, nulls_threshold=nulls_threshold,
                                            max_num_cat_cardinality=max_num_cat_cardinality, customer_id_col=customer_id_col,
                                            customer_effective_date_col=customer_effective_date_col, customer_relationship_start_date_col=customer_relationship_start_date_col,
                                            customer_summary_end_date=customer_summary_end_date, customer_product_summary_end_date_col=customer_product_summary_end_date_col,
                                            required_product_attributes=required_product_attributes, default_attributes=default_attributes, scoring_date=scoring_date)
        
        prepped_data_dict = scoring_prep.prep_data(input_df, 'score')
        
        for product_id in product_list:
            if prepped_data_dict[product_id] is None:
                print("Data prep filtered out customer data. Unable to score.", file=sys.stderr)
                return None

            # handle empty data
            if prepped_data_dict[product_id].shape[0] == 0:
                print("Data prep filtered out customer data. Unable to score.", file=sys.stderr)
                return None
        
            # if a column does not exist in scoring but is in training, add the column to scoring dataset
            for col in cols_used_for_training[product_id]:
                if col not in list(prepped_data_dict[product_id].columns):
                    prepped_data_dict[product_id][col] = 0

            # if a column exists in scoring but not in training, delete it from scoring dataset
            for col in list(prepped_data_dict[product_id].columns):
                if col not in cols_used_for_training[product_id]:
                    prepped_data_dict[product_id].drop(col, axis=1, inplace=True)

            # make sure order of scoring columns is same as training dataset
            prepped_data_dict[product_id] = prepped_data_dict[product_id][cols_used_for_training[product_id]]
        
            # fill in any missing data - in our metadata we had a key called 'col_means' which was converted into
            # a variable called 'col_means' in the 'globals().update(metadata)' line of code above
            # this new variable is a dictionary with a key for each product 
            # in turn, the value associated with each of these keys is a dictionary with key for each column 
            # and value being the mean value for that column from the data used for training 
            for col, col_mean_value in col_means[product_id].items():
                # only update means if the product is in the list of training columns
                if col in cols_used_for_training[product_id]:
                    prepped_data_dict[product_id][col].fillna(col_mean_value, inplace=True)
        
            # if logistic regression is used as final model for any product, the last cell in 1-model_training appended
            # variable scaling data into the metadata json file
            # if any of the keys created in that step, cols_to_standardise, scaler_means and scaler_standard_dev, are present in the 
            # json file we need to standardise our data
            if 'scaler_means' in metadata:
                # if the above is True, next check if logistic regression was used for this product, ie is the product listed as a key in the nested dictionary
                # only if the product is listed as a key does it mean logistic regression was used
                # using globals().update created a new variable for each of cols_to_standardise, scaler_means and scaler_standard_dev
                if product_id in scaler_means:
                    # loop through each column to be standardised for the product, get the corresponding mean and standard deviation value calculated from training data
                    for i in range(0, len(cols_to_standardise[product_id])):
                        current_col = cols_to_standardise[product_id][i]
                        current_col_mean = scaler_means[product_id][i]
                        current_col_standard_dev = scaler_standard_dev[product_id][i]
                        # scale the variable 
                        prepped_data_dict[product_id][current_col] = (prepped_data_dict[product_id][current_col] - current_col_mean) / current_col_standard_dev
                                 
        return prepped_data_dict
    
    def score(payload):
        import json
        
        scoring_date = payload['input_data'][0]['values']
        cust_id = payload['input_data'][0]['cust_id']
        
        prepped_data_dict = prep(cust_id, scoring_date)
        
        result = {}
        for product_id, prepped_data in prepped_data_dict.items():
            # handle empty data
            if prepped_data is None:
                return {"predictions" : [{'values' : 'Data prep filtered out customer data. Unable to score.'}]}
            elif prepped_data.shape[0] == 0:
                return {"predictions" : [{'values' : 'Data prep filtered out customer data. Unable to score.'}]}
            else:
                #scoring_url = parms['wml_credentials']["url"] + "/v4/deployments/" + parms['model_deployment_id'][product_id] + "/predictions"
                scoring_payload = {"input_data":  [{ "values" : prepped_data.values.tolist()}]}
                
                response_scoring = client.deployments.score(parms['model_deployment_id'][product_id], scoring_payload)
                result[product_id] = response_scoring
                
        
        return {"predictions" : [{'values' : result}]}
    
    return score

### Deploy the Function

The user can specify the name of the function and deployment in the code below. As we have previously seen, we use tags in the metadata to allow us to programmatically identify the deployed function. 

In [11]:
# store the function and deploy it 
function_name = 'offer_affinity_scoring_pipeline_function'
function_deployment_name = 'offer_affinity_scoring_pipeline_function_deployment'


The Software Specification refers to the runtime used in the Notebook, WML training and WML deployment. We use the software specification `runtime-22.2-py3.10` to store the function. We get the ID of the software specification and include it in the metadata when storing the function. Available Software specifications can be retrieved using `client.software_specifications.list()`.



In [12]:
software_spec_id = client.software_specifications.get_uid_by_name("runtime-22.2-py3.10")

In [13]:
# add the metadata for the function and deployment    
meta_data = {
    client.repository.FunctionMetaNames.NAME : function_name,
    client.repository.FunctionMetaNames.TAGS : [ 'offer_affinity_scoring_pipeline_function_tag'],
    client.repository.FunctionMetaNames.SOFTWARE_SPEC_UID: software_spec_id
}

function_details = client.repository.store_function(meta_props=meta_data, function=scoring_pipeline)

function_id = function_details["metadata"]["id"]

meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: function_deployment_name,
    client.deployments.ConfigurationMetaNames.TAGS : ['offer_affinity_scoring_pipeline_function_deployment_tag'],
    client.deployments.ConfigurationMetaNames.SERVING_NAME:function_name.replace("_","")[:30]+''.join(random.choice(string.ascii_lowercase + string.digits) for _ in range(6))
}

# deploy the stored model
function_deployment_details = client.deployments.create(artifact_uid=function_id, meta_props=meta_props)



#######################################################################################

Synchronous deployment creation for uid: 'fedd42b5-ca1b-4596-be01-886b56bf3467' started

#######################################################################################


initializing......
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='46d39dec-7981-4b8c-831c-7ade07a70853'
------------------------------------------------------------------------------------------------




### Score New Data

Get the guid of the deployed function, create the payload and use the python client to score the data. The deployed function returns the classification prediction along with the probabilities. 

The payload cantains two values. The first is the effective date for scoring. This is the date that the prediction is computed. The second value contains the ID of the customer who we would like to make the prediction for. 

In [14]:
scoring_deployment_id = client.deployments.get_uid(function_deployment_details)
client.deployments.get_details(scoring_deployment_id)

{'entity': {'asset': {'id': 'fedd42b5-ca1b-4596-be01-886b56bf3467'},
  'custom': {},
  'deployed_asset_type': 'function',
  'hardware_spec': {'id': 'b128f957-581d-46d0-95b6-8af5cd5be580',
   'name': 'XXS',
   'num_nodes': 1},
  'name': 'offer_affinity_scoring_pipeline_function_deployment',
  'online': {'parameters': {'serving_name': 'offeraffinityscoringpipelinefubgk6g9'}},
  'space_id': '02c4eb81-18b1-4fa0-b6c4-a7b61d8fb8d9',
  'status': {'online_url': {'url': 'https://internal-nginx-svc.wkc.svc.cluster.local:12443/ml/v4/deployments/46d39dec-7981-4b8c-831c-7ade07a70853/predictions'},
   'serving_urls': ['https://internal-nginx-svc.wkc.svc.cluster.local:12443/ml/v4/deployments/46d39dec-7981-4b8c-831c-7ade07a70853/predictions',
    'https://internal-nginx-svc.wkc.svc.cluster.local:12443/ml/v4/deployments/offeraffinityscoringpipelinefubgk6g9/predictions'],
   'state': 'ready'}},
 'metadata': {'created_at': '2022-12-11T22:08:04.011Z',
  'id': '46d39dec-7981-4b8c-831c-7ade07a70853',
  'mod

In [15]:
cust_id = 1000

payload = [{'values' : "2018-09-30", 'cust_id' : cust_id}]

payload_metadata = {client.deployments.ScoringMetaNames.INPUT_DATA: payload}
# score
funct_output = client.deployments.score(scoring_deployment_id, payload_metadata)
funct_output

{'predictions': [{'values': {'EDUCATION': {'predictions': [{'fields': ['prediction',
        'probability'],
       'values': [[1, [0.49995585184126967, 0.5000441481587303]]]}]},
    'CASH': {'predictions': [{'fields': ['prediction', 'probability'],
       'values': [[0, [0.7363795188282993, 0.2636204811717006]]]}]},
    'BROKERAGE': {'predictions': [{'fields': ['prediction', 'probability'],
       'values': [[0, [0.824465882965883, 0.17553411703411703]]]}]},
    'FINANCIALPLAN': {'predictions': [{'fields': ['prediction', 'probability'],
       'values': [[1, [0.3177183770483799, 0.6822816229516201]]]}]},
    'RETIREMENTPLAN': {'predictions': [{'fields': ['prediction',
        'probability'],
       'values': [[0, [0.5609216309835223, 0.43907836901647773]]]}]}}}]}

# Deploy Shiny App

In this section we will complete the steps to deploy a Shiny Dashboard in Cloud Pak for Data. The app can be deployed in a similar way to models and functions, using the `ibm-watson-machine-learning` package.

All of the files associated with the dashboard are contained in a zip file which is stored in data assets. If the user would like to make changes to the dashboard, they can download the zip from data assets and upload it in the RStudio IDE. 

In [16]:
r_shiny_deployment_name='Customer-Offer-Affinity-Shiny-App'

### Store the App

Create the associated metadata and store the dashboard zip file in the deployment space. 

In [17]:
# Meta_props to store assets in space 
meta_props = {
    client.shiny.ConfigurationMetaNames.NAME: "Offer_Affinity_Shiny_assets",
    client.shiny.ConfigurationMetaNames.DESCRIPTION: 'Store shiny assets in deployment space' # optional
}
app_details = client.shiny.store(meta_props, '/project_data/data_asset/customer-offer-affinity-analytics-dashboard.zip')

Creating Shiny asset...
SUCCESS


### Deploy the App

Create the metadata for the Shiny deployment by providing  name, description, R-Shiny options and Hardware specifications. R-Shiny configuration provides options on whom you want to share the dashboard with, they are 1) anyone with the link 2) Authenticated users 3) Collaborators in this deployment space

In [18]:
# Deployment metadata.
deployment_meta_props = {
    client.deployments.ConfigurationMetaNames.NAME: r_shiny_deployment_name,
    client.deployments.ConfigurationMetaNames.DESCRIPTION: 'Deploy Customer Offer Affinity dashboard',
    client.deployments.ConfigurationMetaNames.R_SHINY: { 'authentication': 'anyone_with_url' },
    client.deployments.ConfigurationMetaNames.HARDWARE_SPEC: { 'name': 'S', 'num_nodes': 1}
}

# Create the deployment.
app_uid = client.shiny.get_uid(app_details)
rshiny_deployment = client.deployments.create(app_uid, deployment_meta_props)



#######################################################################################

Synchronous deployment creation for uid: 'e7977aa6-f12c-4105-a7c0-b9df238cd36b' started

#######################################################################################


initializing
Note: The asset is associated with one of the deprecated software specification and will be removed in future. For details, see https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_latest/wsj/wmls/wmls-deploy-python-types.html
..........
ready


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='1cdb0f53-d7cc-465c-9235-fac9da67fc54'
------------------------------------------------------------------------------------------------




### Launch Shiny App
Now that the dashboard is deployed, it can be accessed through the web browser. The app URL can be found by navigating to the deployed app in the deployment space. 

Open the Navigation Menu, select **Deployments -> Spaces -> Customer Offer Affinity Space -> Deployments -> Customer-Offer-Affinity-Shiny-App** to find the dashboard URL.

Alternatively, the path for the app URL can be found from the deployment metadata created in the previous cell. This path should be appended to the user's Cloud Pak for Data hostname to get the complete app URL. To get the path, run the cell below:

In [19]:
print("{HOSTNAME}"+"/ml/v4/deployments/"+rshiny_deployment['metadata']['id'] + '/r_shiny')

{HOSTNAME}/ml/v4/deployments/1cdb0f53-d7cc-465c-9235-fac9da67fc54/r_shiny


<hr>

**Sample Materials, provided under license. <br>
Licensed Materials - Property of IBM. <br>
© Copyright IBM Corp. 2019, 2022. All Rights Reserved. <br>
US Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. <br>**