# Building Product Recommendation Engine with Amazon Personalize

In this lab, we would like to use Amazon.com’s customer rating data to build product recommendation plugin for our website. We will use Amazon Personalize to train the recommender model and to host the recommendation inference. In addition, we will test out the inference and display the items that user rated and items that are recommended for that user.

## Download and prepare sample dataset

In [None]:
!curl -o ./metadata.json.gz http://deepyeti.ucsd.edu/jianmo/amazon/metaFiles/meta_AMAZON_FASHION.json.gz
!curl -o ./ratings.json.gz http://deepyeti.ucsd.edu/jianmo/amazon/categoryFiles/AMAZON_FASHION.json.gz

Data by:    
Justifying recommendations using distantly-labeled reviews and fined-grained aspects. Jianmo Ni, Jiacheng Li, Julian McAuley. Empirical Methods in Natural Language Processing (EMNLP), 2019.

In [None]:
!gunzip -f ratings.json.gz
!gunzip -f metadata.json.gz

**Loading ratings into panda data frame**

In [None]:
import pandas as pd 
ratings_df = pd.read_json("./ratings.json", lines=True)

In [None]:
ratings_df = ratings_df.drop(columns=['reviewerName'])

In [None]:
ratings_df.head()

**Include only important columns**

In [None]:
ratings_df = ratings_df[["reviewerID","asin","overall","unixReviewTime"]]

In [None]:
ratings_df.head()

**This is how the ratings file look like (first 5 lines)**

In [None]:
import boto3

personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

**Take rating 4 and 5 as positive reviews**

In [None]:
ratings_df = ratings_df[ratings_df['overall'] > 3]     

**Rename column headers for Personalize**

In [None]:
ratings_df = ratings_df.rename(columns={'reviewerID':'USER_ID', 'asin':'ITEM_ID', 'unixReviewTime':'TIMESTAMP'})

In [None]:
ratings_df.head()

## Specify a bucket and data output location

In [None]:
import sagemaker
sess = sagemaker.Session()
bucket = sess.default_bucket()
prefix = 'recommendation-engine-with-personalize-fashion-console'
filename = "clean_product_ratings.csv"

In [None]:
ratings_df.to_csv(filename, index=False)

boto3.Session().resource('s3').Bucket(bucket).Object("{}/{}".format(prefix,filename)).upload_file(filename)



In [None]:
print("{}/{}/{}".format(bucket,prefix,filename))

## Build with Personalize using AWS UI Console

Please follow https://docs.aws.amazon.com/personalize/latest/dg/getting-started-console.html

## Clean and upload product metadata

In [None]:
filename = 'metadata.json'
boto3.Session().resource('s3').Bucket(bucket).Object("{}/{}".format(prefix,filename)).upload_file(filename)

## Prepare for inference

**Prepare method to enrich the items information with title and image URL**

In [None]:
import json
def encrich_with_metadata(products):
    #products = list(map(lambda x: x.encode('ascii'), products))
    client = boto3.client('s3')
    r = client.select_object_content(
        Bucket=bucket,
        Key="{}/metadata.json".format(prefix),
        Expression="SELECT s.image, s.asin, s.title FROM S3Object s WHERE s.asin IN {}".format(products),
        ExpressionType='SQL',
        RequestProgress={
            'Enabled': False
        },
        InputSerialization={
            'JSON': {
                'Type': 'LINES'
            }
        },
        OutputSerialization={
            'JSON':{
                'RecordDelimiter': '\n',
            }
        },
    )
    output = []
    for event in r['Payload']:
        if 'Records' in event:
            recs = event['Records']['Payload'].decode('utf-8').strip().split("\n")
            recs = list(map(lambda x: json.loads(x), recs))
            output += recs
    return output

**Get a sample user to test**

In [None]:
# Get a user who has considerable number of reviews
my_list = ratings_df['USER_ID'].value_counts()[:10000].index.to_list()
#user_id = my_list[9125].encode('ascii')
user_id = my_list[6543].encode('ascii')
user_id

**Define method to display items for view purpose**

In [None]:
import re
def display_items(items):
    image_string = ""
    i = 1
    for item in items:
        if 'score' in item:
            caption = "{}---Score:{}---Name: {}---ASIN:{}".format(str(i),item['score'], item['title'],item['asin'])
        else:
            caption = "{}---Name: {}---ASIN:{}".format(str(i),item['title'],item['asin'])
        if len(item['image']) > 0:
            image = item['image'][0]
            image = re.sub(r'SR..,..','SR200,200',image)
            image = re.sub(r'US..','US200',image)
            image = re.sub(r'SS..','SS200',image)
            image = re.sub(r'SX..','SX200',image)
            image = re.sub(r'SY..','SX200',image)
            image = re.sub(r'CR,0,0,..,..','CR,0,0,200,200',image)
            image_string += '<figure style="float:left;"><img src="{}" alt="" width="200"/><figcaption ><center>{}</center></figcaption></figure></br>'.format(image,caption)
        else:
            image_string += '<figure style="float:left;"><img src="" alt="" width="200"/><figcaption ><center>{}</center></figcaption></figure></br>'.format(caption)
        i = i+1
    return image_string

**Get actual items that user reviewed with rating > 3 and enrich with title and image URL**

In [None]:
actual_item_list = list(ratings_df[ratings_df["USER_ID"] == user_id]['ITEM_ID'])
actual_items = encrich_with_metadata(actual_item_list)

In [None]:
from IPython.display import HTML
HTML(data=display_items(actual_items))

**Get recommendation for user personalize recipe and enrich recommended items with title and images**


In [None]:
campaign_arn_up = 'arn:aws:personalize:ap-southeast-1:344028372807:campaign/my-fashion-test-campaign'

In [None]:
get_recommendations_response_up = personalize_runtime.get_recommendations(
    campaignArn = campaign_arn_up,
    userId = str(user_id),
    numResults=3,
    filterArn='arn:aws:personalize:ap-southeast-1:344028372807:filter/fashion-test-exclude-purchases1'
)

recommended_item_list = list(map(lambda x: x['itemId'], get_recommendations_response_up['itemList']))
print(get_recommendations_response_up['itemList'])

**Prepare for display**

In [None]:
recommended_items = encrich_with_metadata(recommended_item_list)
items_dictionary = {}
for item in get_recommendations_response_up['itemList']:
    items_dictionary[item['itemId']]=item['score']                                           
for item in recommended_items:
    item['score'] = items_dictionary[item['asin']]
recommended_items.sort(key=lambda x: x['score'], reverse=True)

**These are the actual items that user reviewed**

In [None]:
from IPython.display import HTML
HTML(data=display_items(recommended_items))