# Collaborative Filtering
Xiaolan Li

This project mainly uses User Based Collaborative Filtering and Item Based Collaborative Filtering to recommend movies for users that they haven't seen. 

For similarity methods, I will use Jaccard and Pearson methods in 1-5 ratings values and 0/1 values to do the process.

# 1. Reading Data

In [1]:
# reding rating data
import pandas as pd
df_rating = pd.read_csv('https://raw.githubusercontent.com/xiaolancara/Recommender-System/main/data/Movie_Survey/MovieSurvey_Rating.csv')
df_rating

Unnamed: 0,userid,movieid,ratings
0,1,1,5
1,1,2,5
2,1,3,3
3,1,4,5
4,2,1,3
5,2,2,4
6,2,3,1
7,2,4,5
8,2,5,5
9,2,6,4


In [2]:
# reading movie data
df_movie = pd.read_csv('https://raw.githubusercontent.com/xiaolancara/Recommender-System/main/data/Movie_Survey/MovieSurvey_Tag.csv')
df_movie.rename(columns={"tag": "genres"},inplace = True)
df_movie

Unnamed: 0,movieid,movietiltle,genres
0,1,Forrest Gump,Romance
1,2,Joker,"Crime, Thriller"
2,3,Avengers: Endgame,"Action, Adventure"
3,4,Spirited Away,"Crime, Animation"
4,5,Parasite,"Comedy, Thriller"
5,6,Soul,"Animation, Adventure, Comedy"


# 2. Building CF Recommender System

### Algorithm Similarity Methods
__Jaccard and Pearson__

Jaccard Algorithm is as follow.

![JaccardSimilarity](https://wikimedia.org/api/rest_v1/media/math/render/svg/eaef5aa86949f49e7dc6b9c8c3dd8b233332c9e7)

Pearson Algorithm is as follow.

![PearsonSimilarity](https://miro.medium.com/max/481/1*qCdw27XS0Q9shX4-0pJ96w.png)

In [3]:
# import scipy for using algorithm directly and check if the method is correct. It's not used in this project
import scipy.stats
from scipy.spatial import distance

# define jaccard similarity algorithm manually
def jaccard_similarity(matrix):
    rownum=len(matrix)
    jaccardScore=np.zeros((rownum,rownum))
    for i in range(rownum):
        for j in range(rownum):
            list1 = matrix.iloc[i]
            list2 = matrix.iloc[j]
            intersection = len(set(list1).intersection(set(list2)))
            union = len(set(list1).union(set(list2)))
            jaccardScore[i,j] = float(intersection) / union
            #for check correct result
            #jaccardScore[i,j] = distance.jaccard(list1,list2)
    return jaccardScore

# define pearson similarity algorithm manually
def pearson_similarity(matrix):
    rownum=len(matrix)
    pearsonScore=np.zeros((rownum,rownum))
    for i in range(rownum):
        for j in range(rownum):
            x = matrix.iloc[i]
            y = matrix.iloc[j]
            n = len(x)
            sum_x = float(sum(x))
            sum_y = float(sum(y))
            sum_x_sq = sum(xi*xi for xi in x)
            sum_y_sq = sum(yi*yi for yi in y)
            psum = sum(xi*yi for xi, yi in zip(x, y))
            num = psum - (sum_x * sum_y/n)
            den = pow((sum_x_sq - pow(sum_x, 2) / n) * (sum_y_sq - pow(sum_y, 2) / n), 0.5)
            if den == 0: 
                return 0
            pearsonScore[i,j] = num / den
            #for check correct result
            #pearsonScore[i,j] = scipy.stats.pearsonr(x, y)[0]
    return pearsonScore

### Find Similar Users/Items with neighbours

In [4]:
# reduce neighbours number to top n
def find_n_neighbours(df,n):
    df = df.apply(lambda x: pd.Series(x.sort_values(ascending=False).iloc[:n].index, 
          index=['top{}'.format(i) for i in range(1, n+1)]), axis=1)
    return df

## (1) User Based CF
People with similar characteristics share similar taste.

### Data Preparation

- __Rating Scale: 1-5__

__Normalized ratings for users__

Normalized ratings can avoid bias users who prefer to rate high or low.

In [5]:
# calculate mean ratings for each user
Mean = df_rating.groupby(by="userid",as_index=False)['ratings'].mean().rename(columns={'ratings':'avg_ratings'})
Mean

Unnamed: 0,userid,avg_ratings
0,1,4.5
1,2,3.666667
2,3,3.2
3,4,3.6
4,5,3.75


In [6]:
# add 'adg_rating' column that means the difference between mean rating and rating for user to each item
Rating_avg = pd.merge(df_rating,Mean,on='userid')
Rating_avg['adg_rating']=Rating_avg['ratings']-Rating_avg['avg_ratings']
df_newRatings=pd.pivot_table(Rating_avg,values='adg_rating',index='userid',columns='movieid')

# Replacing NaN by Movie Average
df_newRatings = df_newRatings.fillna(df_newRatings.mean(axis=0))
df_newRatings

movieid,1,2,3,4,5,6
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,0.5,0.5,-1.5,0.5,0.511111,-0.304167
2,-0.666667,0.333333,-2.666667,1.333333,1.333333,0.333333
3,0.8,0.077778,0.8,-1.2,0.8,-1.2
4,0.4,-0.6,0.4,0.470833,-0.6,0.4
5,1.25,0.077778,-1.75,1.25,0.511111,-0.75


### User Similarity Metric Rating Scale: 1-5

In [7]:
# test function Jaccard Not scale
import numpy as np
jaccard = jaccard_similarity(df_newRatings)
np.fill_diagonal(jaccard, 0 )
df_jaccard_similarity =pd.DataFrame(jaccard,index=df_newRatings.index)
df_jaccard_similarity.columns=df_newRatings.index
df_jaccard_similarity.head()

userid,1,2,3,4,5
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,0.0,0.0,0.0,0.0,0.125
2,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.142857
4,0.0,0.0,0.0,0.0,0.0
5,0.125,0.0,0.142857,0.0,0.0


From the table, the similarity of the jaccard between users are too low because there's almost no intersection ratings after calculate the bias. 

In [8]:
# test function Pearson Not scale
pearson = pearson_similarity(df_newRatings)
np.fill_diagonal(pearson, 0 )
df_pearson_similarity =pd.DataFrame(pearson,index=df_newRatings.index)
df_pearson_similarity.columns=df_newRatings.index
df_pearson_similarity.head()

userid,1,2,3,4,5
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,0.0,0.824878,-0.147106,-0.42233,0.908184
2,0.824878,0.0,-0.484324,-0.396873,0.670795
3,-0.147106,-0.484324,0.0,-0.36355,-0.102493
4,-0.42233,-0.396873,-0.36355,0.0,-0.100183
5,0.908184,0.670795,-0.102493,-0.100183,0.0


### Find Similar Users with neighbours-Ratings 1-5

In [9]:
# top 2 neighbours for each user using pearson
sim_user_2_p = find_n_neighbours(df_pearson_similarity,2)
sim_user_2_p.head()

Unnamed: 0_level_0,top1,top2
userid,Unnamed: 1_level_1,Unnamed: 2_level_1
1,5,2
2,1,5
3,3,5
4,4,5
5,1,2


In [10]:
# top 2 neighbours for each user using jaccard
sim_user_2_j = find_n_neighbours(df_jaccard_similarity,2)
sim_user_2_j.head()

Unnamed: 0_level_0,top1,top2
userid,Unnamed: 1_level_1,Unnamed: 2_level_1
1,5,4
2,5,4
3,5,4
4,5,4
5,3,1


- __Rating Scale: 0 or 1__

0: 1,2,3

1: 4,5

In [11]:
# copy rating data to prepare scale data
df_Ratings_scale = df_rating.copy()

# replace value to 0 or 1
df_Ratings_scale['ratings'] = np.where(df_Ratings_scale['ratings']>3, 1, 0)

# calculate mean ratings for each user
Mean_scale = df_Ratings_scale.groupby(by="userid",as_index=False)['ratings'].mean().rename(columns={'ratings':'avg_ratings'})
Mean_scale

Unnamed: 0,userid,avg_ratings
0,1,0.75
1,2,0.666667
2,3,0.6
3,4,0.6
4,5,0.5


__Normalized ratings for users__

In [12]:
# add 'adg_rating' column that means the difference between mean rating and rating for user to each item
Rating_avg_scale = pd.merge(df_Ratings_scale,Mean_scale,on='userid')
Rating_avg_scale['adg_rating']=Rating_avg_scale['ratings']-Rating_avg_scale['avg_ratings']

# pivot table with adg_rating value
df_newRatings_scale=pd.pivot_table(Rating_avg_scale,values='adg_rating',index='userid',columns='movieid')

# Replacing NaN by Movie Average
df_newRatings_scale = df_newRatings_scale.fillna(df_newRatings_scale.mean(axis=0))
df_newRatings_scale

movieid,1,2,3,4,5,6
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,0.25,0.25,-0.75,0.25,0.044444,-0.091667
2,-0.666667,0.333333,-0.666667,0.333333,0.333333,0.333333
3,0.4,-0.005556,0.4,-0.6,0.4,-0.6
4,0.4,-0.6,0.4,0.120833,-0.6,0.4
5,0.5,-0.005556,-0.5,0.5,0.044444,-0.5


### User Similarity Metric Rating Scale: 0/1

In [13]:
# test function Jaccard scale
jaccard = jaccard_similarity(df_newRatings_scale)
np.fill_diagonal(jaccard, 0 )
df_jaccard_similarity_scale =pd.DataFrame(jaccard,index=df_newRatings_scale.index)
df_jaccard_similarity_scale.columns=df_newRatings_scale.index
df_jaccard_similarity_scale.head()

userid,1,2,3,4,5
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,0.0,0.0,0.0,0.0,0.142857
2,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.5,0.166667
4,0.0,0.0,0.5,0.0,0.0
5,0.142857,0.0,0.166667,0.0,0.0


In [14]:
# test function Pearson scale
pearson = pearson_similarity(df_newRatings_scale)
np.fill_diagonal(pearson, 0 )
df_pearson_similarity_scale =pd.DataFrame(pearson,index=df_newRatings_scale.index)
df_pearson_similarity_scale.columns=df_newRatings_scale.index
df_pearson_similarity_scale.head()

userid,1,2,3,4,5
userid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,0.0,0.481061,-0.291785,-0.398172,0.769844
2,0.481061,0.0,-0.633913,-0.597591,0.011216
3,-0.291785,-0.633913,0.0,-0.189917,0.016275
4,-0.398172,-0.597591,-0.189917,0.0,-0.148555
5,0.769844,0.011216,0.016275,-0.148555,0.0


### Find Similar Users with neighbours-Ratings 0/1

In [15]:
# top 2 neighbours for each user using pearson
sim_user_2_p = find_n_neighbours(df_pearson_similarity_scale,2)
sim_user_2_p.head()

Unnamed: 0_level_0,top1,top2
userid,Unnamed: 1_level_1,Unnamed: 2_level_1
1,5,2
2,1,5
3,5,3
4,4,5
5,1,3


In [16]:
# top 2 neighbours for each user using jaccard
sim_user_2_j = find_n_neighbours(df_jaccard_similarity_scale,2)
sim_user_2_j.head()

Unnamed: 0_level_0,top1,top2
userid,Unnamed: 1_level_1,Unnamed: 2_level_1
1,5,4
2,5,4
3,4,5
4,3,5
5,3,1


### Predict ratings for users 

The method I use to predict the rating for specific user and item is as below, where p(a,i) is the prediction for target or active user a for item i, w(a,u) is the similarity between users a and u, and K is the neighborhood of most similar users.

![user_predict](https://miro.medium.com/max/701/1*MdEImGMBgGY_5xltOJJAQA.png)

In [17]:
#This function predicts rating for specified user-item combination based on user-based approach
def predict_userbased(user_id, item_id, similarity_method='pearson', scaleRating = False, k=2):
    prediction=0    
    product=1
    wtd_sum = 0 
    sum_wt = 0
    
    if scaleRating == False:
        df_avg = df_newRatings
        mean_rating = Rating_avg['avg_ratings'][(Rating_avg['userid'] == user_id)].iloc[0]
        if similarity_method == 'pearson':
            df_similarity = df_pearson_similarity
        else:
            df_similarity = df_jaccard_similarity
    elif scaleRating == True:
        df_avg = df_newRatings_scale
        mean_rating = Rating_avg_scale['avg_ratings'][(Rating_avg_scale['userid'] == user_id)].iloc[0]
        if similarity_method == 'pearson':
            df_similarity = df_pearson_similarity_scale
        else:
            df_similarity = df_jaccard_similarity_scale
    similarityUser = find_n_neighbours(df_similarity,k).iloc[user_id-1].tolist()
    for i in range(k):
        similarityScore = df_similarity.iloc[user_id-1,similarityUser[i]-1]
        ratings_diff = df_avg.iloc[similarityUser[i]-1,item_id-1]
        product = ratings_diff * (similarityScore)
        wtd_sum = wtd_sum + product
        sum_wt += similarityScore
    
    if sum_wt == 0:
        prediction = int(round(mean_rating))
    else:
        prediction = int(round(mean_rating + (wtd_sum/sum_wt)))
    #print('\nPredicted rating for user {0} -> item {1}: {2}'.format(user_id,item_id,prediction))

    return prediction

In [18]:
predict_userbased(1,1,'pearson', False)

5

In [19]:
predict_userbased(2,1,'jaccard', True)

1

## (2) Items Based CF

### Prepare data

#### Rating Scale: 1-5

In [20]:
ItemsRating = pd.pivot_table(df_rating,values='ratings',index='movieid',columns='userid')
ItemsRating = ItemsRating.fillna(ItemsRating.mean(axis=0))
ItemsRating

userid,1,2,3,4,5
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,5.0,3.0,4.0,4.0,5.0
2,5.0,4.0,3.2,3.0,3.75
3,3.0,1.0,4.0,4.0,2.0
4,5.0,5.0,2.0,3.6,5.0
5,4.5,5.0,4.0,3.0,3.75
6,4.5,4.0,2.0,4.0,3.0


### Item Similarity Metric Rating Scale: 1-5

In [21]:
jaccard = jaccard_similarity(ItemsRating)
np.fill_diagonal(jaccard, 0 )
items_jaccard_similarity =pd.DataFrame(jaccard,index=ItemsRating.index)
items_jaccard_similarity.columns=ItemsRating.index
items_jaccard_similarity

movieid,1,2,3,4,5,6
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,0.0,0.6,0.4,0.2,0.6,0.4
2,0.6,0.0,0.285714,0.142857,0.666667,0.285714
3,0.4,0.285714,0.0,0.166667,0.285714,0.6
4,0.2,0.142857,0.166667,0.0,0.142857,0.166667
5,0.6,0.666667,0.285714,0.142857,0.0,0.5
6,0.4,0.285714,0.6,0.166667,0.5,0.0


In [22]:
pearson = pearson_similarity(ItemsRating)
np.fill_diagonal(pearson, 0 )
items_pearson_similarity =pd.DataFrame(pearson,index=ItemsRating.index)
items_pearson_similarity.columns=ItemsRating.index
items_pearson_similarity

movieid,1,2,3,4,5,6
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,0.0,0.364159,0.27501,0.197534,-0.315244,0.0
2,0.364159,0.0,-0.428408,0.685687,0.675939,0.579205
3,0.27501,-0.428408,0.0,-0.760532,-0.682724,-0.287612
4,0.197534,0.685687,-0.760532,0.0,0.401226,0.713661
5,-0.315244,0.675939,-0.682724,0.401226,0.0,0.206056
6,0.0,0.579205,-0.287612,0.713661,0.206056,0.0


#### Find Similar Items with neighbours Rating Scale: 1-5

In [23]:
# top 2 neighbours for each user
sim_items_2_p = find_n_neighbours(items_pearson_similarity,2)
sim_items_2_p.head()

Unnamed: 0_level_0,top1,top2
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1
1,2,3
2,4,5
3,1,3
4,6,2
5,2,4


#### Rating Scale: 0 or 1

In [24]:
ItemsRating_scale = pd.pivot_table(df_Ratings_scale,values='ratings',index='movieid',columns='userid')
ItemsRating_scale = ItemsRating_scale.fillna(ItemsRating_scale.mean(axis=0))
ItemsRating_scale

userid,1,2,3,4,5
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
1,1.0,0.0,1.0,1.0,1.0
2,1.0,1.0,0.6,0.0,0.5
3,0.0,0.0,1.0,1.0,0.0
4,1.0,1.0,0.0,0.6,1.0
5,0.75,1.0,1.0,0.0,0.5
6,0.75,1.0,0.0,1.0,0.0


### Item Similarity Metric- Rating Scale: 0/1

In [25]:
jaccard = jaccard_similarity(ItemsRating_scale)
np.fill_diagonal(jaccard, 0 )
items_jaccard_similarity_scale =pd.DataFrame(jaccard,index=ItemsRating_scale.index)
items_jaccard_similarity_scale.columns=ItemsRating_scale.index
items_jaccard_similarity_scale

movieid,1,2,3,4,5,6
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,0.0,0.5,1.0,0.666667,0.5,0.666667
2,0.5,0.0,0.5,0.75,0.6,0.4
3,1.0,0.5,0.0,0.666667,0.5,0.666667
4,0.666667,0.75,0.666667,0.0,0.4,0.5
5,0.5,0.6,0.5,0.4,0.0,0.75
6,0.666667,0.4,0.666667,0.5,0.75,0.0


In [26]:
pearson = pearson_similarity(ItemsRating_scale)
np.fill_diagonal(pearson, 0 )
items_pearson_similarity_scale =pd.DataFrame(pearson,index=ItemsRating_scale.index)
items_pearson_similarity_scale.columns=ItemsRating_scale.index
items_pearson_similarity_scale

movieid,1,2,3,4,5,6
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,0.0,-0.512206,0.408248,-0.357217,-0.467707,-0.49099
2,-0.512206,0.0,-0.704361,0.368689,0.842971,0.052945
3,0.408248,-0.704361,0.0,-0.875,-0.327327,-0.089087
4,-0.357217,0.368689,-0.875,0.0,-0.122748,0.412028
5,-0.467707,0.842971,-0.327327,-0.122748,0.0,-0.262445
6,-0.49099,0.052945,-0.089087,0.412028,-0.262445,0.0


### Find Similar Items with neighbours Rating Scale: 0/1

In [27]:
sim_items_2_j = find_n_neighbours(items_jaccard_similarity_scale,2)
sim_items_2_j.head()

Unnamed: 0_level_0,top1,top2
movieid,Unnamed: 1_level_1,Unnamed: 2_level_1
1,3,6
2,4,5
3,1,6
4,2,3
5,6,2


### Predict ratings for items 

The method I use to predict the rating for specific user and item is as below, where K is the neighborhood of most similar items rated by active user a, and w(i,j) is the similarity between items i and j.

![item_predict](https://miro.medium.com/max/451/1*4LhLv-MRP29aHESuaWwMAA.png)

In [28]:
#This function predicts rating for specified user-item combination based on user-based approach
def predict_itembased(user_id, item_id, similarity_method, scaleRating = False, k=2):
    prediction=0    
    product=1
    wtd_sum = 0 
    sum_wt = 0
    
    if scaleRating == False:
        df_avg = ItemsRating
        if similarity_method == 'pearson':
            df_similarity = items_pearson_similarity
        else:
            df_similarity = items_jaccard_similarity
    elif scaleRating == True:
        df_avg = ItemsRating_scale
        if similarity_method == 'pearson':
            df_similarity = items_pearson_similarity_scale
        else:
            df_similarity = items_jaccard_similarity_scale
            
    similarityItem = find_n_neighbours(df_similarity,k).iloc[item_id-1].tolist()
    for i in range(k):
        similarityScore = df_similarity.iloc[item_id-1,similarityItem[i]-1]
        ratings = df_avg.iloc[similarityItem[i]-1,user_id-1]
        product = ratings * (similarityScore)
        wtd_sum = wtd_sum + product
        sum_wt += similarityScore
    
    if sum_wt == 0:
        prediction = 0
    else:
        prediction = int(round(wtd_sum/sum_wt))
    #print('\nPredicted rating for user {0} -> item {1}: {2}'.format(user_id,item_id,prediction))

    return prediction

In [29]:
predict_itembased(user_id=1,item_id=3,similarity_method = 'pearson',scaleRating = False, k=2)

5

In [30]:
predict_itembased(user_id = 1,item_id = 6,similarity_method = 'jaccard',scaleRating = True, k=2)

0

## 3. Evaluation of Recommendation System -RMSE

In [31]:
from sklearn.metrics import mean_squared_error
from math import sqrt

# define evaluation for each system
def evaluateRS(similarity_method = 'pearson',Based_UserItem = 'user', scaleRating = False):
    prediction = []
    if scaleRating == False:
        df_test = df_rating
    else:
        df_test = df_Ratings_scale
    for i in range(len(df_test)):
        userid = df_test.iloc[i]['userid']
        itemid = df_test.iloc[i]['movieid']
        if Based_UserItem == 'user':
            prediction.append(predict_userbased(userid,itemid,similarity_method,scaleRating))
        else:
            prediction.append(predict_itembased(userid,itemid,similarity_method,scaleRating))
    MSE = mean_squared_error(prediction, df_test['ratings'].to_list())
    RMSE = round(sqrt(MSE),3)
    
    return RMSE

__UserBased Not Scale__

In [32]:
UserPearson_RMSE = evaluateRS(similarity_method = 'pearson',Based_UserItem = 'user', scaleRating = False)
UserPearson_RMSE

1.155

In [33]:
UserJaccard_RMSE = evaluateRS(similarity_method = 'jaccard',Based_UserItem = 'user', scaleRating = False)
UserJaccard_RMSE

1.208

__UserBased Scale__

In [34]:
UserPearsonScale_RMSE = evaluateRS(similarity_method = 'pearson',Based_UserItem = 'user', scaleRating = True)
UserPearsonScale_RMSE

0.54

In [35]:
UserJaccardScale_RMSE = evaluateRS(similarity_method = 'jaccard',Based_UserItem = 'user', scaleRating = True)
UserJaccardScale_RMSE

0.612

__ItemBased Not Scale__

In [36]:
ItemPearson_RMSE = evaluateRS(similarity_method = 'pearson',Based_UserItem = 'item', scaleRating = False)
ItemPearson_RMSE

1.225

In [37]:
ItemJaccard_RMSE = evaluateRS(similarity_method = 'jaccard',Based_UserItem = 'item', scaleRating = False)
ItemJaccard_RMSE

1.225

__ItemBased Scale__

In [38]:
ItemPearsonScale_RMSE = evaluateRS(similarity_method = 'pearson',Based_UserItem = 'item', scaleRating = True)
ItemPearsonScale_RMSE

0.5

In [39]:
ItemJaccardScale_RMSE = evaluateRS(similarity_method = 'jaccard',Based_UserItem = 'item', scaleRating = True)
ItemJaccardScale_RMSE

0.645

# 4. Conclusion

From the above RMSE for each system, I can find : The best system is using ItemBased and pearson method in 0/1 rating scale.

- The rating scale is 0/1 system has relatively lower prediction error. 

- Pearson looks has lower error than using jaccard method

- Item-based or User-based systems do not show a better trend in this dataset.

# 5. Reference
https://medium.com/sfu-cspmp/recommendation-systems-user-based-collaborative-filtering-using-n-nearest-neighbors-bf7361dc24e0

https://towardsdatascience.com/collaborative-filtering-based-recommendation-systems-exemplified-ecbffe1c20b1

https://zhuanlan.zhihu.com/p/47025768

http://lijiancheng0614.github.io/scikit-learn/modules/generated/sklearn.neighbors.NearestNeighbors.html