# Building Product Recommendation Engine with Amazon Personalize

In this lab, we would like to use Amazon.com’s customer rating data to build product recommendation plugin for our website. We will use Amazon Personalize to train the recommender model and to host the recommendation inference. In addition, we will test out the inference and display the items that user rated and items that are recommended for that user.

## Download and prepare sample dataset

In [1]:
!curl -o ./metadata.json.gz http://deepyeti.ucsd.edu/jianmo/amazon/metaFiles/meta_AMAZON_FASHION.json.gz
!curl -o ./ratings.json.gz http://deepyeti.ucsd.edu/jianmo/amazon/categoryFiles/AMAZON_FASHION.json.gz

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 40.7M  100 40.7M    0     0  6174k      0  0:00:06  0:00:06 --:--:-- 8009k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 88.8M  100 88.8M    0     0  7340k      0  0:00:12  0:00:12 --:--:-- 8138k


Data by:    
Justifying recommendations using distantly-labeled reviews and fined-grained aspects. Jianmo Ni, Jiacheng Li, Julian McAuley. Empirical Methods in Natural Language Processing (EMNLP), 2019.

In [2]:
!gunzip -f ratings.json.gz
!gunzip -f metadata.json.gz

**Loading ratings into panda data frame**

In [3]:
import pandas as pd 
ratings_df = pd.read_json("./ratings.json", lines=True)

In [4]:
ratings_df = ratings_df.drop(columns=['reviewerName'])

In [5]:
ratings_df.head()

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,reviewText,summary,unixReviewTime,vote,style,image
0,5,True,"10 20, 2014",A1D4G1SNUZWQOT,7106116521,Exactly what I needed.,perfect replacements!!,1413763200,,,
1,2,True,"09 28, 2014",A3DDWDH9PX2YX2,7106116521,"I agree with the other review, the opening is ...","I agree with the other review, the opening is ...",1411862400,3.0,,
2,4,False,"08 25, 2014",A2MWC41EW7XL15,7106116521,Love these... I am going to order another pack...,My New 'Friends' !!,1408924800,,,
3,2,True,"08 24, 2014",A2UH2QQ275NV45,7106116521,too tiny an opening,Two Stars,1408838400,,,
4,3,False,"07 27, 2014",A89F3LQADZBS5,7106116521,Okay,Three Stars,1406419200,,,


**Include only important columns**

In [6]:
ratings_df = ratings_df[["reviewerID","asin","overall","unixReviewTime"]]

In [7]:
ratings_df.head()

Unnamed: 0,reviewerID,asin,overall,unixReviewTime
0,A1D4G1SNUZWQOT,7106116521,5,1413763200
1,A3DDWDH9PX2YX2,7106116521,2,1411862400
2,A2MWC41EW7XL15,7106116521,4,1408924800
3,A2UH2QQ275NV45,7106116521,2,1408838400
4,A89F3LQADZBS5,7106116521,3,1406419200


**This is how the ratings file look like (first 5 lines)**

In [8]:
import boto3

personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

**Take rating 4 and 5 as positive reviews**

In [9]:
ratings_df = ratings_df[ratings_df['overall'] > 3]     

**Rename column headers for Personalize**

In [10]:
ratings_df = ratings_df.rename(columns={'reviewerID':'USER_ID', 'asin':'ITEM_ID', 'unixReviewTime':'TIMESTAMP'})

In [302]:
ratings_df = ratings_df.drop(columns=['overall'])

In [303]:
ratings_df.head()

Unnamed: 0,USER_ID,ITEM_ID,TIMESTAMP
0,A1D4G1SNUZWQOT,7106116521,1413763200
2,A2MWC41EW7XL15,7106116521,1408924800
5,A29HLOUW0NS0EH,7106116521,1405728000
6,A7QS961ROI6E0,7106116521,1401494400
10,A38NS6NF6WPXS,B00007GDFV,1362787200


## Specify a bucket and data output location

In [304]:
import sagemaker
sess = sagemaker.Session()
bucket = sess.default_bucket()
prefix = 'recommendation-engine-with-personalize-fashion-console'
filename = "clean_product_ratings.csv"

In [305]:
ratings_df.to_csv(filename, index=False)

boto3.Session().resource('s3').Bucket(bucket).Object("{}/{}".format(prefix,filename)).upload_file(filename)



In [306]:
print("{}/{}/{}".format(bucket,prefix,filename))

sagemaker-ap-southeast-1-344028372807/recommendation-engine-with-personalize-fashion-console/clean_product_ratings.csv


## Build with Personalize using AWS UI Console

Please follow https://docs.aws.amazon.com/personalize/latest/dg/getting-started-console.html

## Clean and upload product metadata

In [15]:
filename = 'metadata.json'
boto3.Session().resource('s3').Bucket(bucket).Object("{}/{}".format(prefix,filename)).upload_file(filename)

## Prepare for inference

**Prepare method to enrich the items information with title and image URL**

In [38]:
import json
def encrich_with_metadata(products):
    client = boto3.client('s3')
    r = client.select_object_content(
        Bucket=bucket,
        Key="{}/metadata.json".format(prefix),
        Expression="SELECT s.image, s.asin, s.title FROM S3Object s WHERE s.asin IN {}".format(products),
        ExpressionType='SQL',
        RequestProgress={
            'Enabled': False
        },
        InputSerialization={
            'JSON': {
                'Type': 'LINES'
            }
        },
        OutputSerialization={
            'JSON':{
                'RecordDelimiter': '\n',
            }
        },
    )
    output = []
    for event in r['Payload']:
        if 'Records' in event:
            recs = event['Records']['Payload'].decode('utf-8').strip().split("\n")
            recs = list(map(lambda x: json.loads(x), recs))
            output += recs
    return output

**Get a sample user to test**

In [440]:
# Get a user who has considerable number of reviews
user_id = 'A1V935MCODO9VA'
user_id

'A1V935MCODO9VA'

**Define method to display items for view purpose**

In [448]:
import re
def display_items(items):
    image_string = ""
    i = 1
    for item in items:
        if 'score' in item:
            caption = "{}---Score:{}---Name: {}---ASIN:{}".format(str(i),item['score'], item['title'],item['asin'])
        else:
            caption = "{}---Name: {}---ASIN:{}".format(str(i),item['title'],item['asin'])
        if len(item['image']) > 0:
            image = item['image'][0]
            image = re.sub(r'SR..,..','SR200,200',image)
            image = re.sub(r'US..','US200',image)
            image = re.sub(r'SS..','SS200',image)
            image = re.sub(r'SX..','SX200',image)
            image = re.sub(r'SY..','SX200',image)
            image = re.sub(r'CR,0,0,..,..','CR,0,0,200,200',image)
            image_string += '<figure style="float:left;"><img src="{}" alt="" width="200"/><figcaption ><center>{}</center></figcaption></figure></br>'.format(image,caption)
        else:
            image_string += '<figure style="float:left;"><img src="" alt="" width="200"/><figcaption ><center>{}</center></figcaption></figure></br>'.format(caption)
        i = i+1
    return image_string

**Get actual items that user reviewed with rating > 3 and enrich with title and image URL**

In [449]:
actual_item_list = list(ratings_df[ratings_df["USER_ID"] == user_id]['ITEM_ID'])
actual_items = encrich_with_metadata(actual_item_list)

['B00KA3R3M0', 'B00KA3UV0Q', 'B00KW4LIJQ', 'B00KW4KR66', 'B00LMU7GHM', 'B00S1AQG3Q']


In [450]:
from IPython.display import HTML
HTML(data=display_items(actual_items))

**Get recommendation for user personalize recipe and enrich recommended items with title and images**


In [451]:
campaign_arn_up = 'arn:aws:personalize:ap-southeast-1:344028372807:campaign/my-fashion-test-campaign'

In [452]:
get_recommendations_response_up = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn_up,
    userId = str(user_id),
    numResults=3,
    #filterArn='arn:aws:personalize:ap-southeast-1:344028372807:filter/fashion-test-exclude-purchases1'
)

recommended_item_list = list(map(lambda x: x['itemId'], get_recommendations_response_up['itemList']))
print(get_recommendations_response_up['itemList'])

[{'itemId': 'B00KA3VO3O', 'score': 0.0009517}, {'itemId': 'B00KA3VEG6', 'score': 0.0008883}, {'itemId': 'B00UDF11O6', 'score': 0.0007594}]


**Prepare for display**

In [453]:
recommended_items = encrich_with_metadata(recommended_item_list)
items_dictionary = {}
for item in get_recommendations_response_up['itemList']:
    items_dictionary[item['itemId']]=item['score']                                           
for item in recommended_items:
    item['score'] = items_dictionary[item['asin']]
recommended_items.sort(key=lambda x: x['score'], reverse=True)

['B00KA3VO3O', 'B00KA3VEG6', 'B00UDF11O6']


**These are the actual items that user reviewed**

In [454]:
from IPython.display import HTML
HTML(data=display_items(recommended_items))