# Evaluation of Recommender Systems

Based on the same dataset used on previous weeks, let us evaluate the Collaborative Filtering (CF) models implemented last week.

## Exercise 1

1. Load the test set and the predictions made with both Collaborative Filtering models in the previous session. 
2. Detect those users which are in the training set but not in the test set. Remove their predictions before evaluating the systems.
3. Report the Root Mean Square Error (RMSE) for both CF models defined in the previous session.

In [1]:
import os
import sys
sys.path.append('../')
import pickle
import pandas as pd
import numpy as np

In [7]:

# TEST
df_test = pd.read_pickle("testset.pkl")[["reviewerID", "asin", "overall"]]
df_test = df_test.rename(columns={"reviewerID": "uid", "asin": "iid"})

# PREDICTIONS
nb = pd.read_pickle("preds_knn.pkl")
lf = pd.read_pickle("preds_svd.pkl")
pred_nb_list = list(nb.itertuples(index=False))
pred_lf_list = list(lf.itertuples(index=False))


# Detect users from training set that are not in test
nb_users = set([pred.uid for pred in pred_nb_list])
lf_users = set([pred.uid for pred in pred_lf_list])
nb_users_in_pred_but_not_in_test = list(nb_users.difference(set(df_test['uid'])))
lf_users_in_pred_but_not_in_test = list(lf_users.difference(set(df_test['uid'])))
assert nb_users_in_pred_but_not_in_test == lf_users_in_pred_but_not_in_test
print(f"There are {len(lf_users_in_pred_but_not_in_test)} users in the training set that are not in the test set.")

# Remove these users' predictions for evaluation
print("Lengths before removing preds not in test set:", len(nb), len(lf))
nb = nb[~nb.uid.isin(nb_users_in_pred_but_not_in_test)]
lf = lf[~lf.uid.isin(nb_users_in_pred_but_not_in_test)]
print("After removing:", len(nb), len(lf))

nb_merge = nb.merge(df_test, how="inner", on=["uid", "iid"])
print("\nkNN RMSE:", np.sqrt(np.mean((nb_merge["overall"] - nb_merge["est"])**2)))

lf_merge = lf.merge(df_test, how="inner", on=["uid", "iid"])
print("SVD RMSE:", np.sqrt(np.mean((lf_merge["overall"] - lf_merge["est"])**2)))

There are 113 users in the training set that are not in the test set.
Lengths before removing preds not in test set: 1449029 1449029
After removing: 1359246 1359246

kNN RMSE: 1.1497250696788386
SVD RMSE: 1.0103011297598752


## Exercise 2
Define a general method to get the top-k recommendations for each user. Print the top-k with k={5, 10} recommendations for the user with ID 'ARARUVZ8RUF5T' and its estimated ratings. NB: changed to A100UD67AHFODS

In [12]:
top5nb = nb.groupby(['uid']).apply(lambda x: x.nlargest(5,['est'])).reset_index(drop=True)[["uid", "iid", "est"]]
top5nb[top5nb["uid"]=="A100UD67AHFODS"][["iid","est"]]

Unnamed: 0,iid,est
0,B00003JAU7,5.0
1,B000050HEI,5.0
2,B00003IRBV,5.0
3,B00003IRBU,5.0
4,B00003JAU9,5.0


## Excercise 3
Report Precision@k (P@k), MAP@k and the MRR@k with k={5, 10, 20} averaged across users for both CF systems. When computing precision, we consider as relevant items those with an observed rating >= 4.0 (i.e., those items from the test set with a rating >= 4.0). Reflect on the differences obtained. 

In [13]:
def relevant_column(preds, df_test, k): 
    topKpreds = preds.groupby(['uid']).apply(lambda x: x.nlargest(k,['est'])).reset_index(drop=True)[["uid", "iid", "est"]]
    merged = topKpreds.merge(df_test[["uid", "iid", "overall"]], how="left", on=["uid", "iid"])
    merged["relevant"] = (merged["overall"] >= 4) * 1 
    return merged

In [14]:
def PatK(preds, df_test, k):
    merged = relevant_column(preds, df_test, k)
    score  = merged[["uid", "iid", "relevant"]].groupby(by="uid")["relevant"].mean().mean()
    return score

PatK(nb, df_test, 5), PatK(lf, df_test, 5)

(0.0023378141437755697, 0.0015195791934541204)

In [15]:
def MAPatK(preds, df_test, k):
    merged = relevant_column(preds, df_test, k)
    score  = merged[["uid", "iid", "relevant"]].groupby(by="uid")["relevant"].apply(lambda x: 1./np.arange(1,k+1) @ x).mean()
    return score

MAPatK(nb, df_test, 5), MAPatK(lf, df_test, 5), 

(0.007042665108123905, 0.0033216442626144557)

In [16]:
# inverse of rank position of first relevant item
def first(x):
    for i in range(len(x)):
        if x.iloc[i].relevant == 1:
            return 1/(i+1)
    return 0 

#slide 47 lecture 3
def MRRatK(preds, df_test, k):
    merged = relevant_column(preds, df_test, k)
    score  = merged[["uid", "iid", "relevant"]].groupby(by="uid").apply(first).mean()
    return score

MRRatK(nb, df_test, 5), MRRatK(lf, df_test, 5), 

(0.007042665108123905, 0.0033216442626144557)

In [17]:
def HRatK(preds, df_test, k):
    merged = relevant_column(preds, df_test, k)
    score = merged[["uid", "iid", "relevant"]].groupby(by="uid")["relevant"].apply(lambda x: x.any()*1).mean()
    return score
HRatK(nb, df_test, 5), HRatK(lf, df_test, 5)

(0.011689070718877849, 0.007597895967270602)

In [18]:
ks = [5, 10, 20]

print(12*" " + "NB | LF")

for k in ks:
    P_nb, P_lf = PatK(nb, df_test, k), PatK(lf, df_test, k)
    MAP_nb, MAP_lf = MAPatK(nb, df_test, k), MAPatK(lf, df_test, k)
    MRR_nb, MRR_lf = MRRatK(nb, df_test, k), MRRatK(lf, df_test, k)
    print(f"  P@{k:2g} = {P_nb  :.4f}|{P_lf  :.4f}")
    print(f"MAP@{k:2g} = {MAP_nb:.4f}|{MAP_lf:.4f}")
    print(f"MRR@{k:2g} = {MRR_nb:.4f}|{MRR_lf:.4f}\n")


            NB | LF
  P@ 5 = 0.0023|0.0015
MAP@ 5 = 0.0070|0.0033
MRR@ 5 = 0.0070|0.0033

  P@10 = 0.0021|0.0088
MAP@10 = 0.0083|0.0126
MRR@10 = 0.0083|0.0126

  P@20 = 0.0013|0.0074
MAP@20 = 0.0086|0.0174
MRR@20 = 0.0086|0.0174



## Excercise 4

Based on the top-5, top-10 and top-20 predictions from Exercise 2, compute the systems’ hit rate averaged over the total number of users in the test set.

In [19]:
merged = relevant_column(nb, df_test, 5)
scores = merged[["uid", "iid", "relevant"]].groupby(by="uid")["relevant"].apply(lambda x: x.any()*1)
scores.mean()

0.011689070718877849

In [20]:
ks = [5, 10, 20]

print(12*" " + "NB | LF")

for k in ks:
    MHR_nb, MHR_lf = HRatK(nb, df_test, k), HRatK(lf, df_test, k)
    print(f"MHR@{k:2g} = {MHR_nb:.4f}|{MHR_lf:.4f}")

            NB | LF
MHR@ 5 = 0.0117|0.0076
MHR@10 = 0.0210|0.0883
MHR@20 = 0.0257|0.1473


## Error analysis

In [42]:
dft = pd.read_pickle("testset.pkl")[["reviewerID", "asin", "overall", "unixReviewTime"]] \
        .sort_values("unixReviewTime") \
        .reset_index(drop=True)

uid_first, uid_last = dft.iloc[[0,-1]]["reviewerID"]
uid_first, uid_last


('A2G0O4Y8QE10AE', 'A2SACTIFMC5DXO')

In [61]:
df_train = pd.read_pickle("train.pkl")
df = df_train[["overall", "reviewerID", "asin"]]

df_with_nans = df.pivot_table(values="overall", index="reviewerID", columns="asin")
df = df_with_nans.fillna(0)

corrs = df.T.corr("pearson")

In [99]:
first_nbs = corrs.loc[uid_first].nlargest(11)[1:]
last_nbs = corrs.loc[uid_last].nlargest(11)[1:]
first_nbs

reviewerID
A3FY1GXS48WR8B    0.564608
A37MH7ICH80QOX    0.498747
A11JU33HMT5XPU    0.440051
ACQYIC13JXAOI     0.426855
ADY836HK6QSYR     0.414072
A2E1EFNIZL2FVA    0.406457
A1JMB7RDVEMN71    0.398215
A3L8XRYMLZZES6    0.362919
A1GNYV0RA0EQSS    0.362127
A2D66KSHQQHOSD    0.356498
Name: A2G0O4Y8QE10AE, dtype: float64

In [102]:
reviewed_asins_first

2492    B0017I8NQM
2407    B0014KJ6EQ
Name: asin, dtype: object

In [100]:
reviewed_asins_first = df_train[df_train["reviewerID"]==uid_first]["asin"]
reviewed_asins_last = df_train[df_train["reviewerID"]==uid_last]["asin"]

In [101]:
nb_reviews = []
for rid in first_nbs.index: 
    nb_reviews.append(df_train[df_train["reviewerID"]==rid][["overall", "reviewerID", "asin"]])
pd.concat(nb_reviews)

Unnamed: 0,overall,reviewerID,asin
2348,3.0,A3FY1GXS48WR8B,B0013O54OE
2486,4.0,A3FY1GXS48WR8B,B0017I8NQM
2862,5.0,A37MH7ICH80QOX,B000WCQCE4
2479,5.0,A37MH7ICH80QOX,B0017I8NQM
3307,4.0,A11JU33HMT5XPU,B0017I8NQM
3513,4.0,A11JU33HMT5XPU,B001AFFYSW
5934,3.0,A11JU33HMT5XPU,B005FIWT74
3836,3.0,ACQYIC13JXAOI,B001EJU9ZM
2480,5.0,ACQYIC13JXAOI,B0017I8NQM
4957,5.0,ACQYIC13JXAOI,B003YJ5ESM


Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,style,reviewerName,reviewText,summary,unixReviewTime,vote,image
2492,3.0,False,"08 8, 2008",A2G0O4Y8QE10AE,B0017I8NQM,,Deborah Woehr,Unfortunately Corel stopped making WP for the ...,"File Conversion Issues, Otherwise Good",1218153600,,
2407,3.0,True,"10 5, 2010",A2G0O4Y8QE10AE,B0014KJ6EQ,,Deborah Woehr,I bought MacDictate after reading the many glo...,"Good Quality Software, Bad Quality Headphones",1286236800,,


In [98]:
dff = df_train[df_train["reviewerID"]==uid_first]
dff["count"] = df_train[df_train["asin"].isin(reviewed_asins_first)].groupby("asin")["overall"].count()
dff

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dff["count"] = df_train[df_train["asin"].isin(reviewed_asins_first)].groupby("asin")["overall"].count()


Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,style,reviewerName,reviewText,summary,unixReviewTime,vote,image,count
2492,3.0,False,"08 8, 2008",A2G0O4Y8QE10AE,B0017I8NQM,,Deborah Woehr,Unfortunately Corel stopped making WP for the ...,"File Conversion Issues, Otherwise Good",1218153600,,,
2407,3.0,True,"10 5, 2010",A2G0O4Y8QE10AE,B0014KJ6EQ,,Deborah Woehr,I bought MacDictate after reading the many glo...,"Good Quality Software, Bad Quality Headphones",1286236800,,,


In [90]:
df_train[df_train["asin"].isin(reviewed_asins_last)].groupby("asin")["overall"].count()

asin
B00CTTEKJW    64
B00EDSI8HW    15
B00EFRMECQ     6
B00FZ0E0HE    15
B00FZ0FETC    10
B01019T6O0    20
B015IHWAZW     7
Name: overall, dtype: int64

In [73]:
df_train[df_train["asin"]=='B0014KJ6EQ']

Unnamed: 0,overall,verified,reviewTime,reviewerID,asin,style,reviewerName,reviewText,summary,unixReviewTime,vote,image
3224,5.0,False,"01 27, 2009",A3JRW716H3AX14,B0014KJ6EQ,,Scott Lloyd,I am dictating this review using MacSpeech Dic...,Fantastic product,1233014400,33.0,
3223,5.0,False,"03 4, 2009",A1VLVWTLV3LVHR,B0014KJ6EQ,,Tim Robertson,"A LITTLE HISTORY\nAs a user of Dictate 1.0, I ...",Mark Rudd's MyMac.com Review,1236124800,11.0,
3222,4.0,False,"03 26, 2010",A3NM0RAYSL6PA8,B0014KJ6EQ,,Maine Writer,I was a little surprised to see an average rat...,This is really a great program.,1269561600,,
2407,3.0,True,"10 5, 2010",A2G0O4Y8QE10AE,B0014KJ6EQ,,Deborah Woehr,I bought MacDictate after reading the many glo...,"Good Quality Software, Bad Quality Headphones",1286236800,,
