# Model Evaluation

The purpose of this workbook will be to evaluate the performance of the user-item filtering and item-item filtering systems that we developed in the previous workbooks.

First, I will copy over the final version of the formulas that I developed in the previous two workbooks.

I will then use the `sklearn` package's `train_test_split` function to select a random sample of users for which I will use as my test set.

Using this set of test users, I will attempt to generate a rating for each user using a combination of the two filtering systems I developed. I will use the Root Mean Squared Error (RMSE) formula to evaluate the performance of my each filtering system. Note that a lower RSME value indicates a better performing model.

***

In [1]:
# Import Python libraries as needed
import numpy as np
import pandas as pd
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
# import matplotlib.pyplot as plt ~ not utilized in this workbook

In [2]:
# Import ratings_matrix data-frame as created in a previous workbook
ratings_matrix = pd.read_pickle('data/user/ratings_matrix.pkl')

In [3]:
# Import the review table as created in the data processing workbook
review = pd.read_pickle('data/user/review.pkl')

In [4]:
# Call the list of unique users and businesses numbers used to build ratings matrix and their corresponding original user_id and business_id values
unique_user = pd.read_pickle('data/user/unique_user.pkl')
unique_business = pd.read_pickle('data/user/unique_business.pkl')

In [5]:
# Review the ratings matrix
ratings_matrix

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,6548,6549,6550,6551,6552,6553,6554,6555,6556,6557
0,4.0,,,,,,,,,,...,,,,,,,,,,
1,5.0,,,,,,,,,,...,,,,,,,,,,
2,1.0,,,,,,,,,,...,,,,,,,,,,
3,4.0,,,,,,,,,,...,,,,,,,,,,
4,1.0,,,,,,,,,,...,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
96527,,,,,,,,,,,...,,,,,,,,,,
96528,,,,,,,,,,,...,,,,,,,,,,
96529,,,,,,,,,,,...,,,,,,,,,,5.0
96530,,,,,,,,,,,...,,,,,,,,,,


In [6]:
# Create a function with the user_id value of any two users as the two parameters for the function
def find_user_similarity(userA, userB, ratings_matrix):
    # Create a True/False list of businesses that were given a rating for each of the two users
    businesses_rated_by_userA = ~ratings_matrix.loc[userA, :].isna()
    businesses_rated_by_userB = ~ratings_matrix.loc[userB, :].isna()

    # Consolidate the two boolean lists into a single one which represents only those businesses rated by both users
    businesses_rated_by_both_users = businesses_rated_by_userA & businesses_rated_by_userB

    # Capture the rating values of both users for those businesses that were rated by both users
    # Also transform these values into a format suitable for the cosine_similarity function
    ratings_of_userA = ratings_matrix.loc[userA, businesses_rated_by_both_users].values.reshape(1, -1)
    ratings_of_userB = ratings_matrix.loc[userB, businesses_rated_by_both_users].values.reshape(1, -1)

    # Capture the similaritiy between the two users by comparing their ratings for the set of businesses that they have both provided a rating for
    similarity = cosine_similarity(ratings_of_userA, ratings_of_userB)[0][0]

    # Return the consine similarity value as the output of this function
    return similarity

In [7]:
# Create a function to calculate the user-item rating prediction based on cosine similarity, with the following two parameters:
# target_business = business_id value for business for whom rating is being predicted for
# target_user = user_id value for the user for whom rating is being predicted for
def user_item_rating_prediction(target_user, target_business, ratings_matrix):

    # Create empty lists to store the:
    # 1. Similarities with other users to our target user
    similarities_to_target_user = []
    # 2. Existing ratings provided to our target business
    ratings_given_to_target_business = []

    # Create a list of all users that have provided a rating for the target business
    list_of_users_rating_target_business = list(ratings_matrix[~ratings_matrix.iloc[:, target_business].isna()].index)

    # Loop over every user in our target ratings matrix
    # We can refer to each user as the 'other_user' since we know that our target user did not provide a rating for our target business and hence is not in this smaller data frame
    for other_user in list_of_users_rating_target_business:
        # To compensate for the value error that may occur when the two users we are comparing have 0 businesses that they have both rated together
        try:
            # Capture the cosine similarity between our target user and the current user from the list of user we are looping over
            similarity = find_user_similarity(target_user, other_user, ratings_matrix)
            # Capture this similarity value to our list of similarity values
            similarities_to_target_user.append(similarity)
            # Capture the rating value of the current 'other_user' into our list of ratings given to our target businesses
            ratings_given_to_target_business.append(ratings_matrix.loc[other_user, target_business])
        # If a value error is generated, we simply pass over to the next loop
        # Since we will not be appending no values to neither our list of similarities and list of ratings, we will not be impacting our final calculation
        except:
            pass

    # Use the cosine similarity value to calculate the weighted average of all ratings (for those users that have at least 1 business that they have rated together)
    return np.dot(ratings_given_to_target_business, similarities_to_target_user)/np.sum(similarities_to_target_user)

In [8]:
# Create a function with the business_id value of any two businesses as the two parameters for the function
def find_business_similarity(businessA, businessB, ratings_matrix):
    
    # Create a True/False list of users that gave a rating for each of the two businesses
    users_who_rated_businessA = ~ratings_matrix.loc[:, businessA].isna()
    users_who_rated_businessB = ~ratings_matrix.loc[:, businessB].isna()
    
    # Consolidate the two boolean lists into a single one which represents only those users that rated both businesses
    users_who_rated_both_businesses = users_who_rated_businessA & users_who_rated_businessB
    
    # Capture the rating values of both businesses for those users that rated both businesses
    # Also transform these values into a format suitable for the cosine_similarity function
    ratings_of_businessA = ratings_matrix.loc[users_who_rated_both_businesses, businessA].values.reshape(1, -1)
    ratings_of_businessB = ratings_matrix.loc[users_who_rated_both_businesses, businessB].values.reshape(1, -1)
    
    # Capture the similaritiy between the two businesses by comparing their ratings for the set of users that both provided a rating for them
    similarity = cosine_similarity(ratings_of_businessA, ratings_of_businessB)[0][0]
    
    # Return the consine similarity value as the output of this function
    return similarity

In [9]:
# Create a function to calculate the user-item rating prediction based on cosine similarity, with the following two parameters:
# target_business = business_id value for business for whom rating is being predicted for
# target_user = user_id value for the user for whom rating is being predicted for
def item_item_rating_prediction(target_user, target_business, ratings_matrix):
   
    # Create empty lists to store the:
    # 1. Similarities with other users to our target user
    similarities_to_target_business = []
    # 2. Existing ratings provided to our target business
    ratings_given_by_target_user = []
    
    # Create a list of all users that have provided a rating for the target business
    list_of_businesses_rated_by_target_user = list(ratings_matrix.loc[:, ~ratings_matrix.iloc[target_user, :].isna()].columns)
    
    # Loop over every user in our target ratings matrix
    # We can refer to each user as the 'other_user' since we know that our target user did not provide a rating for our target business and hence is not in this smaller data frame
    for other_business in list_of_businesses_rated_by_target_user:
        # To compensate for the value error that may occur when the two users we are comparing have 0 businesses that they have both rated together
        try:
            # Capture the cosine similarity between our target user and the current user from the list of user we are looping over
            similarity = find_business_similarity(target_business, other_business, ratings_matrix)
            # Capture this similarity value to our list of similarity values
            similarities_to_target_business.append(similarity)
            # Capture the rating value of the current 'other_user' into our list of ratings given to our target businesses
            ratings_given_by_target_user.append(ratings_matrix.loc[target_user, other_business])
        # If a value error is generated, we simply pass over to the next loop
        # Since we will not be appending no values to neither our list of similarities and list of ratings, we will not be impacting our final calculation
        except:
            pass
    
    # Use the cosine similarity value to calculate the weighted average of all ratings (for those users that have at least 1 business that they have rated together)
    return np.dot(ratings_given_by_target_user, similarities_to_target_business)/np.sum(similarities_to_target_business)

In [10]:
# Confirm formuala is working as intended (as tested in previous notebook)
user_item_rating_prediction(0, 1, ratings_matrix)

3.7220670351652867

In [11]:
# Confirm formuala is working as intended (as tested in previous notebook)
item_item_rating_prediction(0, 1, ratings_matrix)

3.7917695651231935

***

## Create Train-Test Split of the Ratings Matrix to Evaluate the Peformance of the Two Different Filtering Systems

In [12]:
# Split the ratings matrix into a train test split 
train_df, test_df = train_test_split(ratings_matrix, test_size = 0.0002)

In [13]:
# Review the shape of the training split
train_df.shape

(96512, 6558)

In [14]:
# Review the shape of the testing split
test_df.shape

(20, 6558)

In [15]:
# Create a list of the specific users included in the test set
test_users = pd.Series(list(test_df.index), name = 'user_num')

In [16]:
# Create a copy of the review df such that we can manipulate it for our testing and evaluation methods
review2 = review.copy()

In [17]:
# Merge the list of unique users and unique business numbers used for ratings matrix back into the review table
review2 = pd.merge(review2, unique_user, how = 'left')
review2 = pd.merge(review2, unique_business, how = 'left')

In [18]:
# Sort the df by the date values, such that the most recent reviews are at the top
# This is so that we keep the most recent reviews when deleting duplicates for each user-business combination
review2 = review2.sort_values('date', ascending = False)

In [19]:
# Review size of df before dropping duplicates
review2.shape

(378098, 11)

In [20]:
# Drop duplicates for each unique user-business combination
review2 = review2.drop_duplicates(subset = ['user_num', 'business_num'])

In [21]:
# Review size of df after dropping duplicates
review2.shape

# Note we now have exactly the same number of records that we used to originally create our ratings matrix

(367390, 11)

In [22]:
# Drop all unncessary columns except the 3 we will be using for our testing and evaluation purposes
review2 = review2[['user_num', 'business_num', 'stars']]

In [23]:
# Only keep those records in our df that match with the set of users included in our testing set
review2 = pd.merge(review2, test_users, on = 'user_num', how = 'inner')

In [24]:
# Review shape of df after dropping all records whose users do not tie back to our testing set
review2.shape

(39, 3)

In [25]:
# Create a new column to check that we have the right match of user-business star ratings to that of our ratings matrix
review2.insert(3, 'check', 0)

In [26]:
# Pull in the records from the ratings matrix for each unique user-business combo
for record in review2.index:
    review2.loc[record, ['check']] = ratings_matrix.loc[review2.loc[record, ['user_num'][0]], review2.loc[record, ['business_num'][0]]]

In [27]:
# Validate that there is no discrenpancies between the values we have chosen for our reviews2 df and our ratings matrix
(review2['stars'] - review2['check']).value_counts()

0.0    39
dtype: int64

In [28]:
# Drop the 'check' column since it has now served its purpose
review2 = review2.drop(columns = {'check'})

In [29]:
# Create new columns inside the review2 df where each value is a nan value to be filled in by the respective user-item and item-item rating predictions
review2.insert(3, 'user-item', np.nan)
review2.insert(4, 'item-item', np.nan)

In [30]:
%%time
# Print the cell execution time (for my laptop this was approx. 30 sec. for a test_size of 0.0002)

# Loop over every record in the review df
for record in review2.index:
    # Capture the user num and business num values for each record being loop over
    user_num = review2.loc[record, ['user_num'][0]]
    business_num = review2.loc[record, ['business_num'][0]]
    
    # Use the user num and business num to generate both the user-item and item-item rating predictions
    review2.loc[record, ['user-item']] = round(user_item_rating_prediction(user_num, business_num, ratings_matrix),2)
    review2.loc[record, ['item-item']] = round(item_item_rating_prediction(user_num, business_num, ratings_matrix),2)

CPU times: user 9.36 s, sys: 411 ms, total: 9.77 s
Wall time: 9.94 s


***

Now that we have made a prediction against each ratings value from our set of test data, we will evaluate the predictions by calculating the Root Mean Squared Error (RMSE) for each set of user-item, item-item, and weighted-average hybrid predictions against the original star rating.

Formula for RMSE taken from link below:

https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas

In [31]:
# Create a function to calculate the Root Mean Squared Error between the original rating and the predicted rating
# The two parameters will be a list or Series of predicted ratings vs the original target ratings
def rmse(predictions, targets):
    # Take the sum of squared differences between the predicted and target ratings, calculate its mean, and then determine its square root
    return np.sqrt(((predictions - targets) ** 2).mean())

In [32]:
# Calculate the RMSE value of the user-item predictions
rmse(review2['user-item'], review2['stars'])

1.434158379123539

In [33]:
# Calculate the RMSE value of the item-item predictions
rmse(review2['item-item'], review2['stars'])

0.9370699013414101

In [34]:
# Add a series of 9 columns which each combine the user-item and item-item predictions using a weighted-average
# Each column represents a different weight ranging from using 10% of the user-item rating and 90% of the item-item rating (hybrid10)...
# ...to using 90% of the user-item rating and 10% of the item-item rating (hybrid90)
review2.insert(5, 'hybrid10', (review2['user-item']*0.1 + review2['item-item']*.9))
review2.insert(6, 'hybrid20', (review2['user-item']*0.2 + review2['item-item']*.8))
review2.insert(7, 'hybrid30', (review2['user-item']*0.3 + review2['item-item']*.7))
review2.insert(8, 'hybrid40', (review2['user-item']*0.4 + review2['item-item']*.6))
review2.insert(9, 'hybrid50', (review2['user-item']*0.5 + review2['item-item']*.5))
review2.insert(10, 'hybrid60', (review2['user-item']*0.6 + review2['item-item']*.4))
review2.insert(11, 'hybrid70', (review2['user-item']*0.7 + review2['item-item']*.3))
review2.insert(12, 'hybrid80', (review2['user-item']*0.8 + review2['item-item']*.2))
review2.insert(13, 'hybrid90', (review2['user-item']*0.9 + review2['item-item']*.1))

In [41]:
# Quick overview of the final version of the reviews2 df
review2.head(3)

Unnamed: 0,user_num,business_num,stars,user-item,item-item,hybrid10,hybrid20,hybrid30,hybrid40,hybrid50,hybrid60,hybrid70,hybrid80,hybrid90
0,53732,2549,5,4.5,5.0,4.95,4.9,4.85,4.8,4.75,4.7,4.65,4.6,4.55
1,31640,5272,3,3.52,3.5,3.502,3.504,3.506,3.508,3.51,3.512,3.514,3.516,3.518
2,31640,809,4,4.26,3.5,3.576,3.652,3.728,3.804,3.88,3.956,4.032,4.108,4.184


In [35]:
# Print the range of RMSE values for each of the rating prediction columns
# Ranging from using 100% of the item-item ratings to 100% of the user-item ratings
print('item-item:', rmse(review2['item-item'], review2['stars']))
print('hybrid10:', rmse(review2['hybrid10'], review2['stars']))
print('hybrid20:', rmse(review2['hybrid20'], review2['stars']))
print('hybrid30:', rmse(review2['hybrid30'], review2['stars']))
print('hybrid40:', rmse(review2['hybrid40'], review2['stars']))
print('hybrid50:', rmse(review2['hybrid50'], review2['stars']))
print('hybrid60:', rmse(review2['hybrid60'], review2['stars']))
print('hybrid70:', rmse(review2['hybrid70'], review2['stars']))
print('hybrid80:', rmse(review2['hybrid80'], review2['stars']))
print('hybrid90:', rmse(review2['hybrid90'], review2['stars']))
print('user-item:', rmse(review2['user-item'], review2['stars']))

item-item: 0.9370699013414101
hybrid10: 0.935326612363228
hybrid20: 0.9478874757410117
hybrid30: 0.9741993556841352
hybrid40: 1.0131915049194922
hybrid50: 1.0634700699801523
hybrid60: 1.123520839558874
hybrid70: 1.1918676466703368
hybrid80: 1.2671688208414933
hybrid90: 1.348259647442983
user-item: 1.434158379123539


***

Here we see that using a pure item-item based approach is much better than using a pure user-item based filtering system.

By using a hybrid approach which takes the weighted average of the two rating predictions, we are able to further reduce our Root Squared Mean Error (RSME).

The best approach using the last random test sample that I ran this workbook for, indicated that taking a weighted average of 20% user-item rating and 80% item-item rating yeilded the best results.  Having ran this workbook multiple times, I noticed that the best hybrid approach ranges from using 20% to 40% of the user-item rating and 80% to 60% of the item-item rating.

However, having ran this workbook multiple times, I consistently noticed that the lowest RSME was from the 'hybrid20' column.  Therefore, my final collaborative recommender system will use a weighted average of 20% against the user-item rating and 80% against the item-item rating.