<a href="https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/7_Matrix_Factorization_Methods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 7.

<a name="top"></a>
## Matrix Factorization Methods

### Table of Contents

Note: The internal links work in Google Colab.

1. **[Preface](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/MovieLens.ipynb#preface)**
2. **[Introduction](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/MovieLens.ipynb#introduction)**
3. **[Exploratory Data Analysis](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/3_Exploratory_Data_Analysis.ipynb.ipynb#eda)**
4. **[Framework](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/4_Framework.ipynb#framework)**
5. **[Content Based Recommenders](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/5_Content_Based_Recommenders.ipynb#content)**
5. **[Collaborative Based Recommenders](#collaborative)**
    - 6.1 - [Introduction](#introduction)
    - 6.2 - [Import Files](#import)
    - 6.3 - [Models](#models)
    - 6.4 - [Results](#results)

***

<a name="introduction"></a>
### 6.1 - Introduction

In the last notebook, I ran some content-based models that recommended movies similar in attribute to like movies. Comparitvely, this notebook will try some neighborhood based (KNN) collaborative filtering. Essentially, it means finding other people like me and recommending movies they liked. Or it might mean recommending movies people watched who also watched the stuff that I liked. Either way, the idea is taking cues from people like me, my neighborhood, and recommending movies based on the things they like that I haven't seen yet. That's why it's call it collaborative filtering. It's recommending stuff based on other people's collaborative behavior. 

There are two types of collaborative filtering: user-based and item-based. The idea behind user-based collaborative filtering is to find other users similar to myself, based on their ratings history, and then recommend movies they liked that I haven't seen yet. Item-based collaborative filtering is essentially flipping the problem on its head. Instead of looking for other people similar to myself, and recommending stuff they liked, I instead look at the things I liked, and recommend stuff that's similar to those things.

Thankfully, [surpriselib](https://surprise.readthedocs.io/en/stable/knn_inspired.html) has models I can use to run both item-based and user-based KNN collaborative recommenders. With surpriselib and Frank's framework, it's actually really easy to try a whole slew of models. Here is an example:

```
# User-based KNN - cosine
UserKNNcosine = KNNBasic(sim_options = {'name': 'cosine', 'user_based': True})
evaluator.AddAlgorithm(UserKNNcosine, "User KNN cosine")
```
`name` denotes the type of similarity measure. `user_based : True` essentially tells the model that it is a user-based filter. Setting it to `False` means it is an item-based filter. 
 
Here is a quick recap on the different similarity measures. Cosine similarity is a good jack of all trades. It's almost always a reasonable thing to start with. Adjusted cosine and Pearson are two different terms for basically the same thing, and it's essentially mean centered cosine similarities. It works in average rating behavior across all of the user's item ratings, or the average ratings of an item across all users. It all depends on which way I flip it, and the main idea is to deal with unusual rating behavior that deviates from the mean. 
 
Spearman rank correlation is the same idea as Pearson but using rankings instead of raw ratings. MSD is mean squared difference.

Because it is relatively to code the different models, I'll run a bunch of them.
 

 
 
 
 
 
 
 
 
 
 


***

**[Back to Top](#top)**

***

<a name="import"></a>
### 6.2 - Import Files

In [None]:
import os
os.mkdir('/content/matrix')
print('Folder created!')
os.chdir('/content/matrix')
print('Files are in this folder!')

Folder created!
Files are in this folder!


In [None]:
pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 5.1MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp37-cp37m-linux_x86_64.whl size=1617559 sha256=ff1dfb62ff87a82fd1e2c29a17c49e44070ac0b293a8beb7ad528c9863038035
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [None]:
!python "MovieLens.py"
print('1 of 5: Done')
!python "RecommenderMetrics.py"
print('2 of 5: Done')
!python "EvaluationData.py"
print('3 of 5: Done')
!python "EvaluatedAlgorithm.py"
print('4 of 5: Done')
!python "Evaluator.py"
print('5 of 5: All Scripts Loaded!')

1 of 5: Done
2 of 5: Done
3 of 5: Done
4 of 5: Done
5 of 5: All Scripts Loaded!


I was able to perform RandomizedSearch on SVD. This is how I was able to tune my hyperparameters.

In [None]:
# -*- coding: utf-8 -*-
"""
Created on Sun Mar 14 22:35:07 2021

@author: Maribel
"""

from MovieLens import MovieLens
from surprise import SVD, SVDpp
from surprise import NormalPredictor
from Evaluator import Evaluator
from surprise.model_selection import GridSearchCV
from surprise.model_selection import RandomizedSearchCV

import random
import numpy as np

def LoadMovieLensData():
    ml = MovieLens()
    print("Loading movie ratings...")
    data = ml.loadMovieLensLatestSmall()
    print("\nComputing movie popularity ranks so we can measure novelty later...")
    rankings = ml.getPopularityRanks()
    return (ml, data, rankings)

np.random.seed(29)
random.seed(29)

# Load up common data set for the recommender algorithms
(ml, evaluationData, rankings) = LoadMovieLensData()

print("Searching for best parameters...")
#param_grid = {'n_epochs': [20, 30], 'lr_all': [0.005, 0.010],
#              'n_factors': [50, 100]}
#gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)

#gs.fit(evaluationData)

param_grid = {'n_epochs': range(10, 31), 
              'lr_all': np.linspace(0.001, 0.10, 10),
              'n_factors': range(10, 100, 20),
              'reg_all': np.linspace(0.01, 0.1, 10)}

rs = RandomizedSearchCV(SVD, param_distributions=param_grid, measures=['rmse', 'mae'], 
                        cv=5, random_state=29, n_jobs = -1, n_iter=50)

rs.fit(evaluationData)

# best RMSE score
#print("Best RMSE score attained: ", gs.best_score['rmse'])
print("Best RMSE score attained: ", rs.best_score['rmse'])

# combination of parameters that gave the best RMSE score
#print(gs.best_params['rmse'])
print(rs.best_params['rmse'])

# Construct an Evaluator to, you know, evaluate them
evaluator = Evaluator(evaluationData, rankings)

#params = gs.best_params['rmse']
#SVDtuned = SVD(n_epochs = params['n_epochs'], lr_all = params['lr_all'], n_factors = params['n_factors'])

params = rs.best_params['rmse']
SVDtuned = SVD(n_epochs = params['n_epochs'], lr_all = params['lr_all'], 
               n_factors = params['n_factors'], reg_all = params['reg_all'])
evaluator.AddAlgorithm(SVDtuned, "SVD - Tuned")

SVDUntuned = SVD()
evaluator.AddAlgorithm(SVDUntuned, "SVD - Untuned")

# Just make random recommendations
Random = NormalPredictor()
evaluator.AddAlgorithm(Random, "Random")

# Fight!
evaluator.Evaluate(True)

evaluator.SampleTopNRecs(ml)


Loading movie ratings...

Computing movie popularity ranks so we can measure novelty later...
Searching for best parameters...
Best RMSE score attained:  0.8553894737323182
{'n_epochs': 11, 'lr_all': 0.034, 'n_factors': 90, 'reg_all': 0.06000000000000001}
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating  SVD - Tuned ...
Evaluating accuracy...
Evaluating top-N with leave-one-out...
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.
Evaluating  SVD - Untuned ...
Evaluating accuracy...
Evaluating top-N with leave-one-out...
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.

In [None]:
# -*- coding: utf-8 -*-
"""
Created on Thu May  3 11:11:13 2018

@author: Frank
"""

from MovieLens import MovieLens
from surprise import SVD, SVDpp
from surprise import NormalPredictor
from Evaluator import Evaluator

import random
import numpy as np

def LoadMovieLensData():
    ml = MovieLens()
    print("Loading movie ratings...")
    data = ml.loadMovieLensLatestSmall()
    print("\nComputing movie popularity ranks so we can measure novelty later...")
    rankings = ml.getPopularityRanks()
    return (ml, data, rankings)

np.random.seed(29)
random.seed(29)

# Load up common data set for the recommender algorithms
(ml, evaluationData, rankings) = LoadMovieLensData()

# Construct an Evaluator to, you know, evaluate them
evaluator = Evaluator(evaluationData, rankings)

# SVD++
SVDPlusPlusTuned = SVDpp(n_epochs=11, lr_all=0.034, n_factors=90, reg_all=0.06)
#{'n_epochs': 11, 'lr_all': 0.034, 'n_factors': 90, 'reg_all': 0.06000000000000001}
evaluator.AddAlgorithm(SVDPlusPlusTuned, "SVD++Tuned")

# SVD++ Untuned
SVDPlusPlus = SVDpp()
evaluator.AddAlgorithm(SVDPlusPlus, "SVD++Untuned")

# Just make random recommendations
Random = NormalPredictor()
evaluator.AddAlgorithm(Random, "Random")

# Fight!
evaluator.Evaluate(True)

evaluator.SampleTopNRecs(ml)


Loading movie ratings...

Computing movie popularity ranks so we can measure novelty later...
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating  SVD++Untuned ...
Evaluating accuracy...
Evaluating top-N with leave-one-out...
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.
Evaluating  SVD++Tuned ...
Evaluating accuracy...
Evaluating top-N with leave-one-out...
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.
Evaluating  SVD++Untuned ...
Evaluating accuracy...
Evaluating top-N with leave-one-out...
Computing hit-rate and rank metrics...
Computing recommendations with fu

***

This would not run.

In [None]:
# -*- coding: utf-8 -*-
"""
Created on Sun Mar 14 22:35:07 2021

@author: Maribel
"""

from MovieLens import MovieLens
from surprise import SVD, SVDpp
from surprise import NormalPredictor
from Evaluator import Evaluator
from surprise.model_selection import GridSearchCV
from surprise.model_selection import RandomizedSearchCV

import random
import numpy as np

def LoadMovieLensData():
    ml = MovieLens()
    print("Loading movie ratings...")
    data = ml.loadMovieLensLatestSmall()
    print("\nComputing movie popularity ranks so we can measure novelty later...")
    rankings = ml.getPopularityRanks()
    return (ml, data, rankings)

np.random.seed(29)
random.seed(29)

# Load up common data set for the recommender algorithms
(ml, evaluationData, rankings) = LoadMovieLensData()

print("Searching for best parameters...")
#param_grid = {'n_epochs': [20, 30], 'lr_all': [0.005, 0.010],
#              'n_factors': [50, 100]}
#gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)

#gs.fit(evaluationData)

param_grid = {'n_epochs': range(10, 31), 
              'lr_all': np.linspace(0.001, 0.20, 20),
              'n_factors': range(10, 100, 20),
              'reg_all': np.linspace(0.01, 0.1, 10)}

rsplus = RandomizedSearchCV(SVDpp, param_distributions=param_grid, measures=['rmse', 'mae'], 
                        cv=3, random_state=29, n_jobs = -1, n_iter=10)

rsplus.fit(evaluationData)

# best RMSE score
#print("Best RMSE score attained: ", gs.best_score['rmse'])
print("Best RMSE score attained: ", rsplus.best_score['rmse'])

# combination of parameters that gave the best RMSE score
#print(gs.best_params['rmse'])
print(rsplus.best_params['rmse'])

# Construct an Evaluator to, you know, evaluate them
evaluator = Evaluator(evaluationData, rankings)

#params = gs.best_params['rmse']
#SVDtuned = SVD(n_epochs = params['n_epochs'], lr_all = params['lr_all'], n_factors = params['n_factors'])

params = rsplus.best_params['rmse']
SVDPlusTuned = SVDpp(n_epochs = params['n_epochs'], lr_all = params['lr_all'], 
               n_factors = params['n_factors'], reg_all = params['reg_all'])
evaluator.AddAlgorithm(SVDPlusTuned, "SVD++ - Tuned")

SVDPlusUntuned = SVDpp()
evaluator.AddAlgorithm(SVDPlusUntuned, "SVD++ - Untuned")

# Just make random recommendations
Random = NormalPredictor()
evaluator.AddAlgorithm(Random, "Random")

# Fight!
evaluator.Evaluate(False)

evaluator.SampleTopNRecs(ml)


Loading movie ratings...

Computing movie popularity ranks so we can measure novelty later...
Searching for best parameters...


KeyboardInterrupt: ignored