<a href="https://colab.research.google.com/github/yordanovagabriela/recommendersys/blob/master/HW3_SVD_and_SVD%2B%2B.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Initial Setup

## Install Packages

In [0]:
!pip install surprise

Collecting surprise
  Downloading https://files.pythonhosted.org/packages/61/de/e5cba8682201fcf9c3719a6fdda95693468ed061945493dea2dd37c5618b/surprise-0.1-py2.py3-none-any.whl
Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/f5/da/b5700d96495fb4f092be497f02492768a3d96a3f4fa2ae7dea46d4081cfa/scikit-surprise-1.1.0.tar.gz (6.4MB)
[K     |████████████████████████████████| 6.5MB 7.4MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.0-cp36-cp36m-linux_x86_64.whl size=1678565 sha256=6c71af1517325cf827f7869ab960ad2ec241fcfdd96c7ac134ee5fd0ba416df1
  Stored in directory: /root/.cache/pip/wheels/cc/fa/8c/16c93fccce688ae1bde7d979ff102f7bee980d9cfeb8641bcf
Successfully built scikit-surprise
Installing collected packages: scikit-surprise, surprise
Successfully installed scikit-surprise-1.1.0 surprise-0.1


## Import Libraries

In [0]:
from surprise import SVD
from surprise import SVDpp
from surprise import KNNWithMeans
from surprise import NormalPredictor

from surprise import Dataset

from surprise import accuracy

from surprise.model_selection import GridSearchCV
from surprise.model_selection import LeaveOneOut
from surprise.model_selection import cross_validate
from surprise.model_selection import train_test_split

from collections import defaultdict
import pandas as pd

# Load Dataset

In [0]:
data = Dataset.load_builtin('ml-100k')
df = pd.DataFrame(data.raw_ratings)
df.head(10)

Unnamed: 0,0,1,2,3
0,196,242,3.0,881250949
1,186,302,3.0,891717742
2,22,377,1.0,878887116
3,244,51,2.0,880606923
4,166,346,1.0,886397596
5,298,474,4.0,884182806
6,115,265,2.0,881171488
7,253,465,5.0,891628467
8,305,451,3.0,886324817
9,6,86,3.0,883603013


# Split Data on Train and Test Set

In [0]:
train_set, test_set = train_test_split(data, test_size=.25, random_state=1)

# Create LOO Train and Test Set

In [0]:
loo = LeaveOneOut(n_splits=1, random_state=1)
for train, test in loo.split(data):
    loo_train = train
    loo_test = test

loo_anti_testset = loo_train.build_anti_testset()

# Define Evaluation Metrics

In [0]:
def mae(predictions):
    return accuracy.mae(predictions, verbose=False)

def rmse(predictions):
    return accuracy.rmse(predictions, verbose=False)

def hit_rate(top_n_predictions, left_out_predictions):
    hits = 0
    total = 0

    for left_out in left_out_predictions:
        user_id = left_out[0]
        left_out_movie_id = left_out[1]

        for movie_id, predicted_rating in top_n_predictions[int(user_id)]:
            if (int(left_out_movie_id) == int(movie_id)):
                hits += 1
                break

        total += 1

    return hits / total

def average_reciprocal_hit_rank(top_n_predictions, left_out_predictions):
    sum = 0
    total = 0

    for user_id, left_out_movie_id, actual_rating, estimated_rating, _ in left_out_predictions:
        hit_rank = 0
        rank = 0

        for movie_id, predicted_rating in top_n_predictions[int(user_id)]:
            rank = rank + 1
            if (int(left_out_movie_id) == movie_id):
                hit_rank = rank
                break

        if (hit_rank > 0) :
            sum += 1.0 / hit_rank

        total += 1

    return sum / total

In [0]:
def get_top_n_predictions(predictions, n = 10, minimum_rating = 4.0):
    top_n = defaultdict(list)

    for user_id, movie_id, actual_rating, estimated_rating, _ in predictions:
        if (estimated_rating >= minimum_rating):
            top_n[int(user_id)].append((int(movie_id), estimated_rating))

    for user_id, ratings in top_n.items():
        ratings.sort(key=lambda x: x[1], reverse=True)
        top_n[int(user_id)] = ratings[:n]

    return top_n

# Evaluate

In [0]:
def evaluate_algorithm(algorithm, n = 10):
  metrics = {}

  print("\t> Training ...")
  algorithm.fit(train_set)

  print("\t> Evaluating accuracy...")
  predictions = algorithm.test(test_set)

  metrics["RMSE"] = rmse(predictions)
  metrics["MAE"] = mae(predictions)

  algorithm.fit(loo_train)
  print("\t> Evaluating top N with leave-one-out...")
  left_out_predictions = algorithm.test(loo_test) 

  all_predictions = algorithm.test(loo_anti_testset)
  top_n_predictions = get_top_n_predictions(all_predictions, n, minimum_rating = 4.0)

  print("\t> Computing hit-rate and rank metrics...")
  metrics["HR"] = hit_rate(top_n_predictions, left_out_predictions)
  metrics["ARHR"] = average_reciprocal_hit_rank(top_n_predictions, left_out_predictions)
  return metrics

In [0]:
def evaluate_algorithms(algorithms, n = 10):
  results = {}

  for name in algorithms:
    algorithm = algorithms[name]
    print(">> Evaluating {} ...".format(name))
    results[name] = evaluate_algorithm(algorithm, n)

  return results

In [0]:
def pretty_print(results):
  print("{:<30} {:<10} {:<10} {:<10} {:<10}".format("Algorithm", "RMSE", "MAE", "HR", "ARHR"))
  for (name, metrics) in results.items():
      print("{:30} {:<10.4f} {:<10.4f} {:<10.4f} {:<10.4f}".format(name, metrics["RMSE"], metrics["MAE"], metrics["HR"], metrics["ARHR"]))

## Untuned
### Default Parameters
- SVD - `n_factors = 100`, `n_epochs  = 20`, `lr_all = 0.005`, `reg_all = 0.02`
- SVD++ - `n_factors = 20`, `n_epochs  = 20`, `lr_all = 0.007`, `reg_all = 0.02`
- KNNWithMeans - `sim_options = {'name': 'MSD', 'user_based': True}`

*surprise.similarities.pearson_baseline()*:\
Compute the (shrunk) Pearson correlation coefficient between all pairs of users (or items) using baselines for centering instead of means. The shrinkage parameter helps to avoid overfitting when only few ratings are available



In [0]:
algorithms = {}
algorithms["SVD"] = SVD(verbose=False)
algorithms["SVD++"] = SVDpp(verbose=False)
algorithms["KNNWithMeans Item-Based"] = KNNWithMeans(sim_options = {'name': 'pearson_baseline', 'user_based': False}, verbose=False)
algorithms["KNNWithMeans User-Based"] = KNNWithMeans(sim_options = {'name': 'pearson_baseline', 'user_based': True}, verbose=False)
algorithms["Random"] = NormalPredictor()

In [0]:
results = evaluate_algorithms(algorithms)

>> Evaluating SVD ...
	> Training ...
	> Evaluating accuracy...
	> Evaluating top N with leave-one-out...
	> Computing hit-rate and rank metrics...
>> Evaluating SVD++ ...
	> Training ...
	> Evaluating accuracy...
	> Evaluating top N with leave-one-out...
	> Computing hit-rate and rank metrics...
>> Evaluating KNNWithMeans Item-Based ...
	> Training ...
	> Evaluating accuracy...
	> Evaluating top N with leave-one-out...
	> Computing hit-rate and rank metrics...
>> Evaluating KNNWithMeans User-Based ...
	> Training ...
	> Evaluating accuracy...
	> Evaluating top N with leave-one-out...
	> Computing hit-rate and rank metrics...
>> Evaluating Random ...
	> Training ...
	> Evaluating accuracy...
	> Evaluating top N with leave-one-out...
	> Computing hit-rate and rank metrics...


In [0]:
pretty_print(results)

Algorithm                      RMSE       MAE        HR         ARHR      
SVD                            0.9432     0.7422     0.0445     0.0162    
SVD++                          0.9222     0.7237     0.0498     0.0186    
KNNWithMeans Item-Based        0.9265     0.7235     0.0032     0.0019    
KNNWithMeans User-Based        0.9454     0.7358     0.0032     0.0018    
Random                         1.5287     1.2311     0.0223     0.0061    


As expected, SVD++ gives better results than SVD in accuracy and hit rate.

The results given by item and user-based CF are rather more interesting. They achieve about the same accuracy (item-based gives slighly better results), but they totally fail at hit rate and reciprocal hit rank, i.e they are pretty goot at predicting ratings, but really bad at giving top-n recommendations. So, if we need to choose betwen item, user-based and random, we could consider using random as it gives way better results on hr and arhr.




## Tuned
Now, lets try to tune the alogrithms and see if we can achieve better results. 

In [0]:
param_grid = {'n_factors':[50,100,150],
              'n_epochs':[20,30],
              'lr_all':[0.005,0.01],
              'reg_all':[0.02,0.1]}

In [0]:
gsSVD = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)
gsSVD.fit(data)

params = gsSVD.best_params['rmse']

In [0]:
print("Best RMSE score achieved for SVD: {}".format(gsSVD.best_score['rmse']))
print("Parameters that gave the best RMSE score for SVD: {}".format(params))

Best RMSE score achieved for SVD: 0.921710712966808
Parameters that gave the best RMSE score for SVD: {'n_factors': 150, 'n_epochs': 30, 'lr_all': 0.01, 'reg_all': 0.1}


In [0]:
gsSVDpp = GridSearchCV(SVDpp, param_grid, measures=['rmse', 'mae'], cv=3)
gsSVDpp.fit(data)

params = gsSVDpp.best_params['rmse']

In [0]:
print("Best RMSE score achieved for SVD: {}".format(gsSVD.best_score['rmse']))
print("Parameters that gave the best RMSE score for SVD: {}".format(params))

In [0]:
algorithms = {}
algorithms["SVD Tuned"] = SVD(n_factors=150, n_epochs=30, lr_all=0.01, reg_all=0.1, verbose=False)
algorithms["SVD++ Tuned"] = SVDpp(n_factors=150, n_epochs=30, lr_all=0.01, reg_all=0.1, verbose=False)
results = evaluate_algorithms(algorithms)

>> Evaluating SVD Tuned ...
	> Training ...
	> Evaluating accuracy...
	> Evaluating top N with leave-one-out...
	> Computing hit-rate and rank metrics...
>> Evaluating SVD++ Tuned ...
	> Training ...
	> Evaluating accuracy...
	> Evaluating top N with leave-one-out...
	> Computing hit-rate and rank metrics...


In [0]:
pretty_print(results)

Algorithm                      RMSE       MAE        HR         ARHR      
SVD Tuned                      0.9185     0.7251     0.0403     0.0109    
SVD++ Tuned                    0.9135     0.7205     0.0403     0.0113    


The tuned versions of SVD and SVD++ give better results on accuracy, however the HR and ARHR are now lower. Which means that they are now worse in giving top-n recommendations.

It will be useful to test other metrics such as diversity as it may turn out that the models give too obscure results opposed to the better accuracy they achieved.