# Build a product recommendation engine

In this notebook, you will:

* load historical shopping data
* structure and view that data in a table that displays customer information, product categories, and shopping history details
* use the *k*-means algorithm, which is useful for cluster analysis in data mining, to segment customers into clusters for the purpose of making an in-store purchase recommendation based on shopping history
* deploy the model to the IBM Watson Machine Learning service in IBM Cloud to create your recommendation application 

By the end of the notebook, you will understand how to build a model to provide product recommendations for customers based on their purchase history.

This notebook runs on Python 3.x with Apache Spark 2.1.

## Table of contents

1. [Setup](#setup)<br>
    1.1. [Import libraries](#libraries)<br>
    1.2. [Setup WML Credentials](#cred)<br>
    1.3. [Load sample data](#load)<br>
    1.4. [View data in a table](#view_table)<br>
2. [Create a KMeans model](#kmeans)<br>
    2.1. [Prepare data](#prepare_data)<br>
    2.2. [Create clusters and define the model](#build_model)<br>
3. [Persist the model](#persist)<br>	
4. [Deploy the model to the cloud](#deploy)<br>
	4.1. [Create deployment for the model](#create_deploy)<br>
	4.2. [Test model deployment](#test_deploy)<br>
5. [Create product recommendations](#create_recomm)<br>
	5.1. [Test product recommendations model](#test_recomm)<br>
6. [Summary and next steps](#summary)<br>

<a id="setup"></a>
## 1. Setup

Install and import the required libraries and load the customer shopping data into this notebook.

[PixieDust](https://pixiedust.github.io/pixiedust/) is a Python helper library. We are using it in this notebook to load and visualize data.

In [4]:
# !pip install --upgrade pixiedust!
!pip install watson-machine-learning-client --upgrade

#Uncomment the Above two lines if you get any error while importing

Collecting watson-machine-learning-client
[?25l  Downloading https://files.pythonhosted.org/packages/e0/db/fa691d1e604678b8ffc3ef59ac6241c6abcebf8df975db6aaa0a2b29bbcf/watson_machine_learning_client-1.0.364-py3-none-any.whl (935kB)
[K    100% |################################| 942kB 1.8MB/s eta 0:00:01
[?25hCollecting ibm-cos-sdk (from watson-machine-learning-client)
[?25l  Downloading https://files.pythonhosted.org/packages/b1/d4/7e1fe33819b80d47dafa5c02c905f7acbbdff7e6cca9af668aaeaa127990/ibm-cos-sdk-2.4.4.tar.gz (50kB)
[K    100% |################################| 51kB 930kB/s eta 0:00:01
[?25hCollecting tqdm (from watson-machine-learning-client)
[?25l  Downloading https://files.pythonhosted.org/packages/6c/4b/c38b5144cf167c4f52288517436ccafefe9dc01b8d1c190e18a6b154cd4a/tqdm-4.31.1-py2.py3-none-any.whl (48kB)
[K    100% |################################| 51kB 1.2MB/s eta 0:00:01
[?25hCollecting tabulate (from watson-machine-learning-client)
[?25l  Downloading https://files

### 1.1. Importing the required Libraries


In [1]:
import json
import pixiedust

Waiting for a Spark session to start...
Spark Initialization Done! ApplicationId = app-20190419144437-0000
KERNEL_ID = 03b0f3a5-637e-4639-a875-4637b1355f48
Pixiedust database opened successfully
Table VERSION_TRACKER created successfully
Table METRICS_TRACKER created successfully

Share anonymous install statistics? (opt-out instructions)

PixieDust will record metadata on its environment the next time the package is installed or updated. The data is anonymized and aggregated to help plan for future releases, and records only the following values:

{
   "data_sent": currentDate,
   "runtime": "python",
   "application_version": currentPixiedustVersion,
   "space_id": nonIdentifyingUniqueId,
   "config": {
       "repository_id": "https://github.com/ibm-watson-data-lab/pixiedust",
       "target_runtimes": ["Data Science Experience"],
       "event_id": "web",
       "event_organizer": "dev-journeys"
   }
}
You can opt out by calling pixiedust.optOut() in a new cell.


Pixiedust runtime updated. Please restart kernel
Table SPARK_PACKAGES created successfully
Table USER_PREFERENCES created successfully
Table service_connections created successfully


The [Watson Machine Learning client](https://pypi.org/project/watson-machine-learning-client/) provides access to the [Watson Machine Learning Service](https://console.bluemix.net/catalog/services/machine-learning) on the IBM Cloud.

In [5]:
from watson_machine_learning_client import WatsonMachineLearningAPIClient

<a id="cred"></a>
### 1.2. Configure IBM Watson Machine Learning credentials
To access your machine learning repository programmatically, you need to copy in your credentials, which you can see in your **IBM Watson Machine Learning** service details in IBM Cloud.

1. Open your `Project` -> `Settings` -> `Associated Service` -> <Machine_Learning_Instance> by going to the [IBM Cloud Dashboard](https://console.bluemix.net/) and double-clicking the instance.
1. Open the _Service Credentials_ tab and view the credentials.
1. Copy your credentials and replace the `**URL**`, `**USERNAME**`, `**PASSWORD**` and `**INSTANCE_ID**` placeholders in the next cell.
1. Run the cell.


In [10]:
# The code was removed by Watson Studio for sharing.

<a id="load"></a>
### 1.3. Load sample data

In this section you will load the data file that contains the customer shopping data using PixieDust's [`sampleData`](https://pixiedust.github.io/pixiedust/loaddata.html) method:

In [6]:
df = pixiedust.sampleData('https://raw.githubusercontent.com/krishnac7/Product_Recommendation_pixie_app/master/data/customers_orders1_opt.csv')

Downloading 'https://raw.githubusercontent.com/krishnac7/Product_Recommendation_pixie_app/master/data/customers_orders1_opt.csv' from https://raw.githubusercontent.com/krishnac7/Product_Recommendation_pixie_app/master/data/customers_orders1_opt.csv
Downloaded 5648773 bytes
Creating pySpark DataFrame for 'https://raw.githubusercontent.com/krishnac7/Product_Recommendation_pixie_app/master/data/customers_orders1_opt.csv'. Please wait...
Loading file using 'SparkSession'
Successfully created pySpark DataFrame for 'https://raw.githubusercontent.com/krishnac7/Product_Recommendation_pixie_app/master/data/customers_orders1_opt.csv'


<a id="view_table"></a>
### 1.4. View data in a table by using Pixiedust

To better examine and visualize the data, run the following cell to view it in a table format. Note that Pixiedust's `display` method can also render data using various chart types, such as pie charts, line graphs, and scatter plots.

In [8]:
display(df)

CUSTNAME,GenderCode,ADDRESS1,CITY,STATE,COUNTRY_CODE,POSTAL_CODE,POSTAL_CODE_PLUS4,ADDRESS2,EMAIL_ADDRESS,PHONE_NUMBER,CREDITCARD_TYPE,LOCALITY,SALESMAN_ID,NATIONALITY,NATIONAL_ID,CREDITCARD_NUMBER,DRIVER_LICENSE,CUST_ID,ORDER_ID,ORDER_DATE,ORDER_TIME,FREIGHT_CHARGES,ORDER_SALESMAN,ORDER_POSTED_DATE,ORDER_SHIP_DATE,AGE,ORDER_VALUE,T_TYPE,PURCHASE_TOUCHPOINT,PURCHASE_STATUS,ORDER_TYPE,GENERATION,Baby Food,Diapers,Formula,Lotion,Baby wash,Wipes,Fresh Fruits,Fresh Vegetables,Beer,Wine,Club Soda,Sports Drink,Chips,Popcorn,Oatmeal,Medicines,Canned Foods,Cigarettes,Cheese,Cleaning Products,Condiments,Frozen Foods,Kitchen Items,Meat,Office Supplies,Personal Care,Pet Supplies,Sea Food,Spices
Earl Bruner,Mr.,3155 Single Street,Alma,QC,CA,G8B 2W5,0,,Earl.M.Bruner@pookmail.com,613-353-4540,American Express,,NE172,ES,5873675G,377232558412092,,10079,3545,2016-03-31 00:00:00,2016-03-31 18:57:39.380000,6.37,SW142,2016-04-20 00:00:00,30/04/2016,19,20.37,Complete,Phone,Frequent,LowValue,Gen_Z,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
Quinn Perry,Master.,749 C Street,Amarillo,TX,US,79109,0,,Quinn.S.Perry@spambob.com,603-366-3347,American Express,,SC322,IT,FHGPMO74D11L254L,343640251981299,,10091,3359,2016-11-13 00:00:00,2016-11-13 21:59:36.250000,68.57,SC199,2016-12-19 00:00:00,27/12/2016,40,24.08,Complete,Phone,Occasional,LowValue,Gen_Y,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
Michael Gordon,Mr.,388 Kelly Drive,Ancona,AN,IT,60123,0,,Michael.S.Gordon@dodgeit.com,0373 6095994,Diners Club,,RP385,U.S.,316290001,36205115370861,,10099,5481,2016-05-11 00:00:00,2016-05-11 05:11:26.258000,8.99,RP385,2016-06-25 00:00:00,20/06/2016,63,33.34,Complete,Phone,Occasional,LowValue,Baby_Boomers,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
Scott Lawson,Mr.,3105 Spadafore Drive,Antioch,WI,US,60002,0,,Scott.M.Lawson@spambob.com,201-799-5873,JCB,,SE133,CA,518957246,3528449971671140,,10115,479,2016-03-28 00:00:00,2016-03-28 02:22:36.564000,18.99,WE421,2016-04-03 00:00:00,19/04/2016,,7.02,Complete,Desktop,Occasional,LowValue,Gen_Z,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Robert Bilbo,Mr.,2553 Clousson Road,Appiano Gentile,CO,IT,22070,0,,Robert.L.Bilbo@dodgeit.com,0390 9449254,JCB,,SC196,IT,QIEZKO91C65L851P,3528389090563465,,10119,155,2016-01-06 00:00:00,2016-01-06 18:23:54.564000,13.7,NW118,2016-01-26 00:00:00,07/02/2016,49,50.16,Complete,Phone,Occasional,LowValue,Gen_X,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Ahmed Richard,Mr.,1346 Carter Street,Appleton,WI,US,54911,0,,Ahmed.P.Richard@mailinator.com,510-517-7759,JCB,,SW214,U.S.,229990001,3528269063187288,,10123,7325,2016-04-02 00:00:00,2016-04-02 16:33:44.886000,11.45,NW112,2016-04-28 00:00:00,30/05/2016,20,227.61,Complete,Phone,FirstTime,HighValue,Gen_Z,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,1,1
Melba Whitehead,Mrs.,4985 Barnes Avenue,Asquith,NSW,AU,2077,0,,Melba.M.Whitehead@trashymail.com,(07) 4507 5357,American Express,,RP121,ES,0568157B,346804013228790,,10159,4170,2016-10-26 00:00:00,2016-10-26 20:07:39.380000,16.85,SC256,2016-11-10 00:00:00,14/11/2016,53,8.46,Cancelled,Phone,Occasional,LowValue,Baby_Boomers,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Mildred Carey,Mrs.,3362 Post Avenue,Aulnay-sous-bois,,FR,93600,0,,Mildred.M.Carey@mailinator.com,03.43.65.57.75,American Express,,NE373,ES,5876860S,370888743475158,,10187,2961,2016-09-12 00:00:00,2016-09-12 06:11:26.258000,19.25,SE136,2016-09-29 00:00:00,09/10/2016,75,260.84,Complete,Phone,Frequent,HighValue,Baby_Boomers,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0
John Riley,Mr.,4904 Hamilton Drive,Barsbittel,,DE,22885,0,,John.S.Riley@spambob.com,04329 69 96 01,Discover,,SE271,U.S.,441590001,6011004087203218,,10247,1401,2016-08-03 00:00:00,2016-08-03 17:27:54.986000,8.15,NE172,2016-09-02 00:00:00,10/08/2016,40,9999999.0,Abandoned,Phone,Occasional,LowValue,Gen_Y,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
Jennifer Becker,Mrs.,4374 Lindale Avenue,Bloomington,IL,US,61701,0,,Jennifer.D.Becker@spambob.com,803-223-5484,JCB,,SE337,FR,1.69E+14,3528255936143442,,10351,1584,2016-11-03 00:00:00,2016-11-03 08:44:16.564000,19.05,NE178,2016-12-01 00:00:00,20/12/2016,24,62.51,Complete,Phone,Frequent,MediumValue,Gen_Y,1,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0


<a id="kmeans"></a>
## 2. Create a *k*-means model

In this section of the notebook you use the *k*-means implementation to associate every customer to a cluster based on their shopping history.

First, import the Apache Spark Machine Learning packages ([MLlib](http://spark.apache.org/docs/2.2.0/api/python/pyspark.ml.html)) that you need in the subsequent steps:


In [11]:
from pyspark.ml import Pipeline
from pyspark.ml.clustering import KMeans
from pyspark.ml.clustering import KMeansModel
from pyspark.ml.feature import VectorAssembler
from pyspark.ml.linalg import Vectors

<a id="prepare_data"></a>
### 2.1. Prepare data

Create a new data set with just the data that you need. Filter the columns that you want, in this case the customer ID column and the product-related columns. Remove the columns that you don't need for aggregating the data and training the model:

In [12]:
# Here are the product cols. In a real world scenario we would query a product table, or similar.
product_cols = ['Baby Food', 'Diapers', 'Formula', 'Lotion', 'Baby wash', 'Wipes', 'Fresh Fruits', 'Fresh Vegetables', 'Beer', 'Wine', 'Club Soda', 'Sports Drink', 'Chips', 'Popcorn', 'Oatmeal', 'Medicines', 'Canned Foods', 'Cigarettes', 'Cheese', 'Cleaning Products', 'Condiments', 'Frozen Foods', 'Kitchen Items', 'Meat', 'Office Supplies', 'Personal Care', 'Pet Supplies', 'Sea Food', 'Spices']
# Here we get the customer ID and the products they purchased
df_filtered = df.select(['CUST_ID'] + product_cols)

Run the `display()` command again, this time to view the filtered information:

In [13]:
display(df_filtered)

CUST_ID,Baby Food,Diapers,Formula,Lotion,Baby wash,Wipes,Fresh Fruits,Fresh Vegetables,Beer,Wine,Club Soda,Sports Drink,Chips,Popcorn,Oatmeal,Medicines,Canned Foods,Cigarettes,Cheese,Cleaning Products,Condiments,Frozen Foods,Kitchen Items,Meat,Office Supplies,Personal Care,Pet Supplies,Sea Food,Spices
10019,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10067,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
10107,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0
10115,1,0,0,0,1,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10151,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,1,1,0,0,1,0,0,0
10211,1,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10379,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,1,1
10555,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,1
10643,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,1,0,0,0,1
10655,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now, aggregate the individual transactions for each customer to get a single score per product, per customer.

In [14]:
df_customer_products = df_filtered.groupby('CUST_ID').sum()  # Use customer IDs to group transactions by customer and sum them up
df_customer_products = df_customer_products.drop('sum(CUST_ID)')
display(df_customer_products)

CUST_ID,sum(Baby Food),sum(Diapers),sum(Formula),sum(Lotion),sum(Baby wash),sum(Wipes),sum(Fresh Fruits),sum(Fresh Vegetables),sum(Beer),sum(Wine),sum(Club Soda),sum(Sports Drink),sum(Chips),sum(Popcorn),sum(Oatmeal),sum(Medicines),sum(Canned Foods),sum(Cigarettes),sum(Cheese),sum(Cleaning Products),sum(Condiments),sum(Frozen Foods),sum(Kitchen Items),sum(Meat),sum(Office Supplies),sum(Personal Care),sum(Pet Supplies),sum(Sea Food),sum(Spices)
15004,0,0,0,0,0,0,0,0,0,1,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0
15162,1,3,1,0,0,2,0,0,3,3,0,0,0,0,1,3,0,2,2,2,0,0,0,1,0,2,1,1,4
14289,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
13898,1,1,0,1,2,1,2,0,1,0,2,0,1,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,0
10788,0,0,0,0,0,0,0,1,0,0,6,1,5,3,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0
13923,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
14245,0,0,0,1,0,0,0,0,0,0,3,0,3,2,0,0,3,0,0,0,1,0,0,0,0,0,0,0,0
15161,0,0,0,0,0,0,0,0,0,0,2,0,2,1,0,0,2,0,0,1,1,0,0,0,0,0,1,0,0
10435,0,0,0,0,0,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0
15093,0,0,0,0,0,0,0,0,0,0,4,0,4,3,1,0,4,0,0,0,2,0,0,0,0,0,0,0,0


<a id="build_model"></a>
### 2.2. Create clusters and define the model 

Create 100 clusters  with a *k*-means model based on the number of times a specific customer purchased a product.

First, create a feature vector by combining the product and quantity columns:

In [15]:
assembler = VectorAssembler(inputCols=["sum({})".format(x) for x in product_cols],outputCol="features") # Assemble vectors using product fields

Next, create the *k*-means clusters and the pipeline to define the model:

In [16]:
kmeans = KMeans(maxIter=50, predictionCol="cluster").setK(100).setSeed(1)  # Initialize model
pipeline = Pipeline(stages=[assembler, kmeans])
model = pipeline.fit(df_customer_products)

Finally, calculate the cluster for each customer by running the original dataset against the *k*-means model: 

In [17]:
df_customer_products_cluster = model.transform(df_customer_products)
display(df_customer_products_cluster)

CUST_ID,sum(Baby Food),sum(Diapers),sum(Formula),sum(Lotion),sum(Baby wash),sum(Wipes),sum(Fresh Fruits),sum(Fresh Vegetables),sum(Beer),sum(Wine),sum(Club Soda),sum(Sports Drink),sum(Chips),sum(Popcorn),sum(Oatmeal),sum(Medicines),sum(Canned Foods),sum(Cigarettes),sum(Cheese),sum(Cleaning Products),sum(Condiments),sum(Frozen Foods),sum(Kitchen Items),sum(Meat),sum(Office Supplies),sum(Personal Care),sum(Pet Supplies),sum(Sea Food),sum(Spices),features,cluster
10362,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,"(29,[10,13,19,22],[1.0,1.0,1.0,1.0])",1
14075,0,0,0,1,0,0,0,0,0,0,3,0,2,2,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,"(29,[3,10,12,13,16,20],[1.0,3.0,2.0,2.0,2.0,1.0])",50
11639,0,0,0,0,0,0,0,0,0,0,1,0,1,2,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,"(29,[10,12,13,16,20],[1.0,1.0,2.0,1.0,1.0])",12
11025,0,0,0,0,0,0,0,1,0,3,0,0,0,0,1,1,0,2,2,0,1,0,0,1,0,2,0,3,3,"(29,[7,9,14,15,17,18,20,23,25,27,28],[1.0,3.0,1.0,1.0,2.0,2.0,1.0,1.0,2.0,3.0,3.0])",68
10798,0,0,0,0,0,0,0,0,0,0,1,0,1,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,"(29,[10,12,13,16,20],[1.0,1.0,1.0,1.0,1.0])",12
14714,0,0,0,0,0,0,0,0,0,0,3,0,2,4,0,0,2,0,0,0,2,0,0,0,0,0,0,0,0,"(29,[10,12,13,16,20],[3.0,2.0,4.0,2.0,2.0])",43
11713,0,0,0,0,0,0,0,0,0,2,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,2,0,2,1,"(29,[9,15,17,25,27,28],[2.0,1.0,1.0,2.0,2.0,1.0])",30
11287,0,0,0,0,0,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,"(29,[10,11,13],[1.0,1.0,1.0])",37
14030,0,0,1,0,0,0,0,0,0,0,4,0,2,2,0,0,2,0,0,0,1,0,0,0,0,0,0,0,0,"(29,[2,10,12,13,16,20],[1.0,4.0,2.0,2.0,2.0,1.0])",5
12559,0,0,0,0,0,1,1,0,0,0,3,0,3,3,0,0,3,0,0,0,0,0,0,0,0,0,0,0,0,"(29,[5,6,10,12,13,16],[1.0,1.0,3.0,3.0,3.0,3.0])",64


<a id="persist"></a>
## 3. Persist the model 

In this section you will learn how to store your model in the Watson Machine Learning repository by using the [Watson Machine Learning Python client library](https://pypi.org/project/watson-machine-learning-client/). 


### 3.1 Save the model 

Connect to the Watson Machine Learning service using the provided credentials.

In [18]:
client = WatsonMachineLearningAPIClient(wml_credentials)

#### Save the model to the Watson Machine Learning repository

You use the Watson Machine Learning client's [Repository class](http://wml-api-pyclient.mybluemix.net/#repository) to store and manage models in the Watson Machine Learning Repository. 

Note: You can also use Watson Studio to manage models but in this notebook we are only using the client library. 

In [19]:
train_data = df_customer_products.withColumnRenamed('CUST_ID', 'label')

model_name = 'Shopping History'
saved_model = client.repository.store_model(model=model, 
                                            meta_props={'name': model_name}, 
                                            training_data=train_data,
                                            pipeline=pipeline)

You can delete a model from the repository by calling `client.repository.delete`.

#### Display list of existing models in the Watson Machine Learning repository 

In [20]:
models_details = client.repository.list_models()

------------------------------------  ----------------  ------------------------  ---------
GUID                                  NAME              CREATED                   FRAMEWORK
61bdd09e-24d4-482f-8553-ab1a70a3c2a0  Shopping History  2019-04-19T14:51:01.933Z  mllib-2.3
------------------------------------  ----------------  ------------------------  ---------


#### Display information about the saved model

In [21]:
saved_model_uid = client.repository.get_model_uid(saved_model)
model_details = client.repository.get_model_details(saved_model_uid)

print(json.dumps(model_details, indent=2))

{
  "entity": {
    "label_col": "label",
    "runtime_environment": "spark-2.3",
    "input_data_schema": {
      "fields": [
        {
          "metadata": {},
          "type": "long",
          "name": "sum(Baby Food)",
          "nullable": true
        },
        {
          "metadata": {},
          "type": "long",
          "name": "sum(Diapers)",
          "nullable": true
        },
        {
          "metadata": {},
          "type": "long",
          "name": "sum(Formula)",
          "nullable": true
        },
        {
          "metadata": {},
          "type": "long",
          "name": "sum(Lotion)",
          "nullable": true
        },
        {
          "metadata": {},
          "type": "long",
          "name": "sum(Baby wash)",
          "nullable": true
        },
        {
          "metadata": {},
          "type": "long",
          "name": "sum(Wipes)",
          "nullable": true
        },
        {
          "metadata": {},
          "type": "long",
      

<a id="deploy"></a>
## 4. Deploy model to the IBM cloud

You use the Watson Machine Learning client's [Deployments class](http://wml-api-pyclient.mybluemix.net/#deployments) to deploy and score models.

### 4.1 Create an online deployment for the model


In [22]:
created_deployment = client.deployments.create(saved_model_uid, 'Deployment of {}'.format(model_name))



#######################################################################################

Synchronous deployment creation for uid: '61bdd09e-24d4-482f-8553-ab1a70a3c2a0' started

#######################################################################################


INITIALIZING
DEPLOY_SUCCESS


------------------------------------------------------------------------------------------------
Successfully finished deployment creation, deployment_uid='1ae74817-94b4-4e2b-9b49-3affc3d9cf5f'
------------------------------------------------------------------------------------------------




### 4.2 Retrieve the scoring endpoint for this model

In [23]:
scoring_endpoint = client.deployments.get_scoring_url(created_deployment)
print(scoring_endpoint)

https://us-south.ml.cloud.ibm.com/v3/wml_instances/91b87722-13be-427c-85dc-06ad7083da66/deployments/1ae74817-94b4-4e2b-9b49-3affc3d9cf5f/online


<a id="test_deploy"></a>
### 4.3. Test the deployed model

To verify that the model was successfully deployed to the cloud, you'll specify a customer ID, for example customer 12027, to predict this customer's cluster against the Watson Machine Learning deployment, and see if it matches the cluster that was previously associated this customer ID.

In [24]:
customer = df_customer_products_cluster.filter('CUST_ID = 12027').collect()
print("Previously calculated cluster = {}".format(customer[0].cluster))

Previously calculated cluster = 23


To determine the customer's cluster using Watson Machine Learning, you need to load the customer's purchase history. This function uses the local data frame to select every product field and the number of times that customer 12027 purchased a product.

In [25]:
from six import iteritems
def get_product_counts_for_customer(cust_id):
    cust = df_customer_products.filter('CUST_ID = {}'.format(cust_id)).take(1)
    fields = []
    values = []
    for row in customer:
        for product_col in product_cols:
            field = 'sum({})'.format(product_col)
            value = row[field]
            fields.append(field)
            values.append(value)
    return (fields, values)

This function takes the customer's purchase history and calls the scoring endpoint:

In [26]:
def get_cluster_from_watson_ml(fields, values):
    scoring_payload = {'fields': fields, 'values': [values]}
    predictions = client.deployments.score(scoring_endpoint, scoring_payload)   
    return predictions['values'][0][len(product_cols)+1]

Finally, call the functions defined above to get the product history, call the scoring endpoint, and get the cluster associated to customer 12027:

In [27]:
product_counts = get_product_counts_for_customer(12027)
fields = product_counts[0]
values = product_counts[1]
print("Cluster calculated by Watson ML = {}".format(get_cluster_from_watson_ml(fields, values)))

Cluster calculated by Watson ML = 23


<a id="create_recomm"></a>
## 5. Create product recommendations

Now you can create some product recommendations.

First, run this cell to create a function that queries the database and finds the most popular items for a cluster. In this case, the **df_customer_products_cluster** dataframe is the database.

In [27]:
# This function gets the most popular clusters in the cell by grouping by the cluster column
def get_popular_products_in_cluster(cluster):
    df_cluster_products = df_customer_products_cluster.filter('cluster = {}'.format(cluster))
    df_cluster_products_agg = df_cluster_products.groupby('cluster').sum()
    row = df_cluster_products_agg.rdd.collect()[0]
    items = []
    for product_col in product_cols:
        field = 'sum(sum({}))'.format(product_col)
        items.append((product_col, row[field]))
    sortedItems = sorted(items, key=lambda x: x[1], reverse=True) # Sort by score
    popular = [x for x in sortedItems if x[1] > 0]
    return popular

Now, run this cell to create a function that will calculate the recommendations based on a given cluster. This function finds the most popular products in the cluster, filters out products already purchased by the customer or currently in the customer's shopping cart, and finally produces a list of recommended products.

In [28]:
# This function takes a cluster and the quantity of every product already purchased or in the user's cart
from pyspark.sql.functions import desc
def get_recommendations_by_cluster(cluster, purchased_quantities):
    # Existing customer products
    print('PRODUCTS ALREADY PURCHASED/IN CART:')
    customer_products = []
    for i in range(0, len(product_cols)):
        if purchased_quantities[i] > 0:
            customer_products.append((product_cols[i], purchased_quantities[i]))
    df_customer_products = sc.parallelize(customer_products).toDF(["PRODUCT","COUNT"])
    df_customer_products.show()
    # Get popular products in the cluster
    print('POPULAR PRODUCTS IN CLUSTER:')
    cluster_products = get_popular_products_in_cluster(cluster)
    df_cluster_products = sc.parallelize(cluster_products).toDF(["PRODUCT","COUNT"])
    df_cluster_products.show()
    # Filter out products the user has already purchased
    print('RECOMMENDED PRODUCTS:')
    df_recommended_products = df_cluster_products.alias('cl').join(df_customer_products.alias('cu'), df_cluster_products['PRODUCT'] == df_customer_products['PRODUCT'], 'leftouter')
    df_recommended_products = df_recommended_products.filter('cu.PRODUCT IS NULL').select('cl.PRODUCT','cl.COUNT').sort(desc('cl.COUNT'))
    df_recommended_products.show(10)

Next, run this cell to create a function that produces a list of recommended items based on the products and quantities in a user's cart. This function uses Watson Machine Learning to calculate the cluster based on the shopping cart contents and then calls the **get_recommendations_by_cluster** function.

In [29]:
# This function would be used to find recommendations based on the products and quantities in a user's cart
def get_recommendations_for_shopping_cart(products, quantities):
    fields = []
    values = []
    for product_col in product_cols:
        field = 'sum({})'.format(product_col)
        if product_col in products:
            value = quantities[products.index(product_col)]
        else:
            value = 0
        fields.append(field)
        values.append(value)
    return get_recommendations_by_cluster(get_cluster_from_watson_ml(fields, values), values)

Run this cell to create a function that produces a list of recommended items based on the purchase history of a customer. This function uses Watson Machine Learning to calculate the cluster based on the customer's purchase history and then calls the **get_recommendations_by_cluster** function.

In [30]:
# This function is used to find recommendations based on the purchase history of a customer
def get_recommendations_for_customer_purchase_history(customer_id):
    product_counts = get_product_counts_for_customer(customer_id)
    fields = product_counts[0]
    values = product_counts[1]
    return get_recommendations_by_cluster(get_cluster_from_watson_ml(fields, values), values)

Now you can take customer 12027 and produce a recommendation based on that customer's purchase history:

In [31]:
get_recommendations_for_customer_purchase_history(12027)

PRODUCTS ALREADY PURCHASED/IN CART:
+-------------+-----+
|      PRODUCT|COUNT|
+-------------+-----+
|      Diapers|    1|
|    Baby wash|    1|
|         Beer|    1|
|         Wine|    3|
|    Medicines|    3|
|       Cheese|    3|
| Frozen Foods|    1|
|Kitchen Items|    1|
|     Sea Food|    1|
|       Spices|    2|
+-------------+-----+

POPULAR PRODUCTS IN CLUSTER:
+-----------------+-----+
|          PRODUCT|COUNT|
+-----------------+-----+
|        Medicines|   74|
|             Wine|   71|
|           Cheese|   67|
|           Spices|   52|
|         Sea Food|   45|
|       Cigarettes|   33|
|     Frozen Foods|   27|
|       Condiments|   25|
|Cleaning Products|   19|
|    Personal Care|   18|
|     Canned Foods|   17|
|          Diapers|   13|
| Fresh Vegetables|   11|
|          Formula|    9|
|        Baby wash|    9|
|            Wipes|    9|
|          Oatmeal|    8|
|  Office Supplies|    7|
|     Pet Supplies|    7|
|             Beer|    6|
+-----------------+-----+
on

Now, take a sample shopping cart and produce a recommendation based on the items in the cart:

In [32]:
get_recommendations_for_shopping_cart(['Diapers','Baby wash','Oatmeal'],[1,2,1])

PRODUCTS ALREADY PURCHASED/IN CART:
+---------+-----+
|  PRODUCT|COUNT|
+---------+-----+
|  Diapers|    1|
|Baby wash|    2|
|  Oatmeal|    1|
+---------+-----+

POPULAR PRODUCTS IN CLUSTER:
+-----------------+-----+
|          PRODUCT|COUNT|
+-----------------+-----+
|          Diapers|   80|
|             Beer|   66|
|       Condiments|   20|
|Cleaning Products|   19|
|     Sports Drink|   16|
|          Popcorn|   16|
|        Baby wash|   14|
|     Pet Supplies|   13|
|           Lotion|   12|
|        Club Soda|   12|
|          Formula|   11|
|            Wipes|   11|
|          Oatmeal|   11|
|    Kitchen Items|   11|
|  Office Supplies|   10|
|        Baby Food|    8|
|     Fresh Fruits|    8|
| Fresh Vegetables|    7|
|     Canned Foods|    5|
+-----------------+-----+

RECOMMENDED PRODUCTS:
+-----------------+-----+
|          PRODUCT|COUNT|
+-----------------+-----+
|             Beer|   66|
|       Condiments|   20|
|Cleaning Products|   19|
|     Sports Drink|   16|
|    

The next optional section outlines how you can easily expose recommendations to notebook users, for example for test purposes.

<a id="test_recomm"></a>
### 5.1 Test product recommendations model

You can interactively test your product recommendations model using a simple PixieApp. [PixieApps](https://ibm-watson-data-lab.github.io/pixiedust/pixieapps.html) encapsulate business logic and data visualizations, making it easy for notebook users to explore data without having to write any code. Typically these applications are pre-packaged and imported into a notebook. However, for illustrative purposes we've embedded the product recommendation source code in this notebook.

<img src="https://raw.githubusercontent.com/IBMCodeLondon/localcart-workshop/master/images/product_recommendation_app.png"></img>

Run this cell, add items to the shopping cart and click the _Refresh_ button to review the recommendation results.

In [33]:
# This function takes a cluster and the quantity of every product already purchased or in the user's cart & returns the data frame of recommendations for the PixieApp
from pyspark.sql.functions import desc
def get_recommendations_by_cluster_app(cluster, purchased_quantities):
    # Existing customer products
    customer_products = []
    for i in range(0, len(product_cols)):
        if purchased_quantities[i] > 0:
            customer_products.append((product_cols[i], purchased_quantities[i]))
    df_customer_products = sc.parallelize(customer_products).toDF(["PRODUCT","COUNT"])
    # Get popular products in the cluster
    cluster_products = get_popular_products_in_cluster(cluster)
    df_cluster_products = sc.parallelize(cluster_products).toDF(["PRODUCT","COUNT"])
    # Filter out products the user has already purchased
    df_recommended_products = df_cluster_products.alias('cl').join(df_customer_products.alias('cu'), df_cluster_products['PRODUCT'] == df_customer_products['PRODUCT'], 'leftouter')
    df_recommended_products = df_recommended_products.filter('cu.PRODUCT IS NULL').select('cl.PRODUCT','cl.COUNT').sort(desc('cl.COUNT'))
    return df_recommended_products


# PixieDust sample application

from pixiedust.display.app import *

@PixieApp
class RecommenderPixieApp:
    def setup(self):
        self.product_cols = product_cols
        
    def computeUserRecs(self, shoppingcart):   
        #format products and quantities from shopping cart display data
        lst = list(zip(*[(item.split(":")[0],int(item.split(":")[1])) for item in shoppingcart.split(",")]))
        products = list(lst[0])
        quantities = list(lst[1])
        #format for the Model function
        lst = list(zip(*[('sum({})'.format(item),quantities[products.index(item)] if item in products else 0) for item in self.product_cols]))
        fields = list(lst[0])
        values = list(lst[1])
        #call the run Model function
        recommendations_df = get_recommendations_by_cluster_app(get_cluster_from_watson_ml(fields, values), values)
        recs = [row["PRODUCT"] for row in recommendations_df.rdd.collect()]
        return recs[:5]
    
    @route(shoppingCart="*")
    def _recommendation(self, shoppingCart):
        recommendation = self.computeUserRecs(shoppingCart)
        self._addHTMLTemplateString(
        """
        <table style="width:100%"> {% for item in recommendation %} <tr> <td type="text" style="text-align:left">{{item}}</td> </tr> {% endfor %} </table>
        """, recommendation = recommendation)

        
    @route()
    def main(self):
        return """
        <script>
        function getValuesRec(){
            return $( "input[id^='prod']" )
            .filter(function( index ) {
                return parseInt($(this).val()) > 0;})
            .map(function(i, product) {
                return $(product).attr("name") + ":" + $(product).val();
            }).toArray().join(",");}
            
        function getValuesCart(){
            return $( "input[id^='prod']" )
            .filter(function( index ) {
                return parseInt($(this).val()) > 0; })
            .map(function(i, product) {
                return $(product).attr("name") + ":" + $(product).val();
            }).toArray(); }
        
        function populateCart(field) {
            user_cart = getValuesCart();
            $("#user_cart{{prefix}}").html("");
            if (user_cart.length > 0) {
                for (var i in user_cart) {
                    var item = user_cart[i];
                    var item_arr = item.split(":")
                    $("#user_cart{{prefix}}").append('<tr><td style="text-align:left">'+item_arr[1]+" "+item_arr[0]+"</td></tr>"); } }
            else { $("#user_cart{{prefix}}").append('<tr><td style="text-align:left">'+ "Cart Empty" +"</td></tr>"); } }
        
        function increase_by_one(field) {
            nr = parseInt(document.getElementById(field).value);
            document.getElementById(field).value = nr + 1;
            populateCart(field); }
        
        function decrease_by_one(field) {
            nr = parseInt(document.getElementById(field).value);
            if (nr > 0) { if( (nr - 1) >= 0) { document.getElementById(field).value = nr - 1; } }
            populateCart(field); } 
        </script>
        
        <table> Products: {% for item in this.product_cols %}
            {% if loop.index0 is divisibleby 4 %} <tr> {% endif %}
                <div class="prod-quantity">
                <td class="col-md-3">{{item}}:</td><td><input size="2" id="prod{{loop.index}}{{prefix}}" class="prods" type="text" 
                    style="text-align:center" value="0" name="{{item}}" /></td>
                <td><button onclick="increase_by_one('prod{{loop.index}}{{prefix}}');">+</button></td>
                <td><button onclick="decrease_by_one('prod{{loop.index}}{{prefix}}');">-</button></td>
                </div>
            {% if ((not loop.first) and (loop.index0 % 4 == 3)) or (loop.last) %} </tr> {% endif %}
        {% endfor %} </table>
        
        <div class="row">
            <div class="col-sm-6"> Your Cart: </div>
            <div class="col-sm-6"> Your Recommendations: <button pd_options="shoppingCart=$val(getValuesRec)" pd_target="recs{{prefix}}"> 
                <pd_script type="preRun"> if (getValuesRec()==""){alert("Your cart is empty");return false;} return true;
                </pd_script>Refresh </button> 
            </div>
        </div>
        
        <div class="row">
        <div class="col-sm-3"> <table style="width:100%" id="user_cart{{prefix}}"> </table> </div> <div class="col-sm-3"> </div>
        <div class="col-sm-3" id="recs{{prefix}}" pd_loading_msg="Calling your model in Watson ML"></div> <div class="col-sm-3"> </div>
        </div>
        """
        
    

#run the app
RecommenderPixieApp().run(runInDialog='false')