<a href="https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/6_Collaborative_Based_Recommenders.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 6.

<a name="top"></a>
## Collaborative Based Recommenders

### Table of Contents

Note: The internal links work in Google Colab.

1. **[Preface](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/MovieLens.ipynb#preface)**
2. **[Introduction](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/MovieLens.ipynb#introduction)**
3. **[Exploratory Data Analysis](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/3_Exploratory_Data_Analysis.ipynb.ipynb#eda)**
4. **[Framework](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/4_Framework.ipynb#framework)**
5. **[Content Based Recommenders](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/5_Content_Based_Recommenders.ipynb#content)**
5. **[Collaborative Based Recommenders](#collaborative)**
    - 6.1 - [Introduction](#introduction)
    - 6.2 - [Import Files](#import)
    - 6.3 - [Models](#models)
    - 6.4 - [Results](#results)

***

<a name="introduction"></a>
### 6.1 - Introduction

In the last notebook, I ran some content-based models that recommended movies similar in attribute to like movies. Comparitvely, this notebook will try some neighborhood based (KNN) collaborative filtering. Essentially, it means finding other people like me and recommending movies they liked. Or it might mean recommending movies people watched who also watched the stuff that I liked. Either way, the idea is taking cues from people like me, my neighborhood, and recommending movies based on the things they like that I haven't seen yet. That's why it's call it collaborative filtering. It's recommending stuff based on other people's collaborative behavior. 

There are two types of collaborative filtering: user-based and item-based. The idea behind user-based collaborative filtering is to find other users similar to myself, based on their ratings history, and then recommend movies they liked that I haven't seen yet. Item-based collaborative filtering is essentially flipping the problem on its head. Instead of looking for other people similar to myself, and recommending stuff they liked, I instead look at the things I liked, and recommend stuff that's similar to those things.

Thankfully, [surpriselib](https://surprise.readthedocs.io/en/stable/knn_inspired.html) has models I can use to run both item-based and user-based KNN collaborative recommenders. With surpriselib and Frank's framework, it's actually really easy to try a whole slew of models. Here is an example:

```
# User-based KNN - cosine
UserKNNcosine = KNNBasic(sim_options = {'name': 'cosine', 'user_based': True})
evaluator.AddAlgorithm(UserKNNcosine, "User KNN cosine")
```
`name` denotes the type of similarity measure. `user_based : True` essentially tells the model that it is a user-based filter. Setting it to `False` means it is an item-based filter. 
 
Here is a quick recap on the different similarity measures. Cosine similarity is a good jack of all trades. It's almost always a reasonable thing to start with. Adjusted cosine and Pearson are two different terms for basically the same thing, and it's essentially mean centered cosine similarities. It works in average rating behavior across all of the user's item ratings, or the average ratings of an item across all users. It all depends on which way I flip it, and the main idea is to deal with unusual rating behavior that deviates from the mean. 
 
Spearman rank correlation is the same idea as Pearson but using rankings instead of raw ratings. MSD is mean squared difference.

Because it is relatively to code the different models, I'll run a bunch of them.
 

 
 
 
 
 
 
 
 
 
 


***

**[Back to Top](#top)**

***

<a name="import"></a>
### 6.2 - Import Files

In [1]:
import os
os.mkdir('/content/collaborative')
print('Folder created!')
os.chdir('/content/collaborative')
print('Folder ready for upload!')

Folder created!
Folder ready for upload!


In [5]:
pip install scikit-surprise

Collecting scikit-surprise
[?25l  Downloading https://files.pythonhosted.org/packages/97/37/5d334adaf5ddd65da99fc65f6507e0e4599d092ba048f4302fe8775619e8/scikit-surprise-1.1.1.tar.gz (11.8MB)
[K     |████████████████████████████████| 11.8MB 11.3MB/s 
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py) ... [?25l[?25hdone
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.1-cp37-cp37m-linux_x86_64.whl size=1617563 sha256=a77fcb96a5626faf80213e8f000ea2faf1a4fc3d772660160c754aacb2a205ef
  Stored in directory: /root/.cache/pip/wheels/78/9c/3d/41b419c9d2aff5b6e2b4c0fc8d25c538202834058f9ed110d0
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.1


In [6]:
print("Loading Framework...")
!python "MovieLens.py"
print('1 of 5: Done')
!python "RecommenderMetrics.py"
print('2 of 5: Done')
!python "EvaluationData.py"
print('3 of 5: Done')
!python "EvaluatedAlgorithm.py"
print('4 of 5: Done')
!python "Evaluator.py"
print('5 of 5: Core Framework Loaded!')

Loading Framework...
1 of 5: Done
2 of 5: Done
3 of 5: Done
4 of 5: Done
5 of 5: Core Framework Loaded!


***

**[Back to Top](#top)**

***

<a name="models"></a>
### 6.3 - Models

I ran all the item and user-based models at the same time. I used different similarity metrics and KNN techniques. All of them are available from [Surpriselib](https://surprise.readthedocs.io/en/stable/knn_inspired.html). The python scripts I used are available [here](https://github.com/villafue/Capstone_2_MovieLens/tree/main/Python%20Scripts/CollaborativeBased/CollaborativeFilteringWorked).

In [None]:
# -*- coding: utf-8 -*-
"""
Created on Thu May  3 11:11:13 2018

@author: Frank
"""

from MovieLens import MovieLens
from surprise import KNNBasic
from surprise import KNNWithZScore
from surprise import KNNWithMeans
from surprise import KNNBaseline
from surprise import NormalPredictor
from Evaluator import Evaluator

import random
import numpy as np

def LoadMovieLensData():
    ml = MovieLens()
    print("Loading movie ratings...")
    data = ml.loadMovieLensLatestSmall()
    print("\nComputing movie popularity ranks so we can measure novelty later...")
    rankings = ml.getPopularityRanks()
    return (ml, data, rankings)

np.random.seed(29)
random.seed(29)

# Load up common data set for the recommender algorithms
(ml, evaluationData, rankings) = LoadMovieLensData()

# Construct an Evaluator to, you know, evaluate them
evaluator = Evaluator(evaluationData, rankings)

# User-based KNN - cosine
UserKNNcosine = KNNBasic(sim_options = {'name': 'cosine', 'user_based': True})
evaluator.AddAlgorithm(UserKNNcosine, "User KNN cosine")

# User-based KNN - msd
UserKNNmsd = KNNBasic(sim_options = {'name': 'msd', 'user_based': True})
evaluator.AddAlgorithm(UserKNNmsd, "User KNN msd")

# User-based KNN - pearson
UserKNNpearson = KNNBasic(sim_options = {'name': 'pearson', 'user_based': True})
evaluator.AddAlgorithm(UserKNNpearson, "User KNN pearson")

# User-based KNN - pearson_baseline
UserKNNpb = KNNBasic(sim_options = {'name': 'pearson_baseline', 'user_based': True})
evaluator.AddAlgorithm(UserKNNpb, "User KNN pearson_basline")

# User-based KNNZScore 
UserKNNzscore = KNNWithZScore(sim_options = {'name': 'cosine', 'user_based': True})
evaluator.AddAlgorithm(UserKNNzscore, "User KNNZScore")

# User-based KNNMeans
UserKNNmeans = KNNWithMeans(sim_options = {'name': 'cosine', 'user_based': True})
evaluator.AddAlgorithm(UserKNNmeans, "User KNNMeans")

# User-based KNNBaseline
UserKNNbline = KNNBaseline(sim_options = {'name': 'cosine', 'user_based': True})
evaluator.AddAlgorithm(UserKNNbline, "User KNNZBaseline")

# Item-based KNN - cosine
ItemKNNcosine = KNNBasic(sim_options = {'name': 'cosine', 'user_based': False})
evaluator.AddAlgorithm(ItemKNNcosine, "Item KNN cosine")

# Item-based KNN - msd
ItemKNNmsd = KNNBasic(sim_options = {'name': 'msd', 'user_based': False})
evaluator.AddAlgorithm(ItemKNNmsd, "Item KNN msd")

# Item-based KNN - pearson
ItemKNNpearson = KNNBasic(sim_options = {'name': 'pearson', 'user_based': False})
evaluator.AddAlgorithm(ItemKNNpearson, "Item KNN pearson")

# Item-based KNN - pearson_baseline
ItemKNNpb = KNNBasic(sim_options = {'name': 'pearson_baseline', 'user_based': False})
evaluator.AddAlgorithm(ItemKNNpb, "Item KNN pearson_basline")

# Item-based KNNZScore 
ItemKNNzscore = KNNWithZScore(sim_options = {'name': 'cosine', 'user_based': False})
evaluator.AddAlgorithm(ItemKNNzscore, "Item KNNZScore")

# Item-based KNNMeans
ItemKNNmeans = KNNWithMeans(sim_options = {'name': 'cosine', 'user_based': False})
evaluator.AddAlgorithm(ItemKNNmeans, "Item KNNMeans")

# Item-based KNNBaseline
ItemKNNbline = KNNBaseline(sim_options = {'name': 'cosine', 'user_based': False})
evaluator.AddAlgorithm(ItemKNNbline, "Item KNNZBaseline")


# Just make random recommendations
Random = NormalPredictor()
evaluator.AddAlgorithm(Random, "Random")

# Fight!
evaluator.Evaluate(True)

evaluator.SampleTopNRecs(ml)


Loading movie ratings...

Computing movie popularity ranks so we can measure novelty later...
Estimating biases using als...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating  User KNN cosine ...
Evaluating accuracy...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Evaluating top-N with leave-one-out...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Computing hit-rate and rank metrics...
Computing recommendations with full data set...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analyzing coverage, diversity, and novelty...
Computing the cosine similarity matrix...
Done computing similarity matrix.
Analysis complete.
Evaluating  User KNN msd ...
Evaluating accuracy...
Computing the msd similarity matrix...
Done computing similarity matrix.
Evaluating top-N with leave-one-out...
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing hi

<a name="results"></a>
### 6.4 - Results

| Algorithm | RMSE | MAE | HR | cHR | ARHR | Coverage | Diversity | Novelty |
|--|--|--|--|--|--|--|--|--|   
| User KNN cosine | 0.9787 | 0.7547 | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.8882 | 6126.9782 |
| User KNN msd | 0.9554 | 0.7330 | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.8853 | 6119.0357 |
| User KNN pearson | 0.9802 | 0.7558 | 0.0000 | 0.0000 | 0.0000 | 0.9984 | 0.8299 | 4877.7677 |
| User KNN pearson_basline | 0.9827 | 0.7560 | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.7412 | 4014.1784 |
| User KNNZScore | 0.9048 | 0.6867 | 0.0049 | 0.0049 | 0.0023 | 0.9951 | 0.5826 | 6121.5700 |
| User KNNMeans | 0.9066 | 0.6931 | 0.0016 | 0.0016 | 0.0002 | 0.9984 | 0.7363 | 5953.5645 |
| User KNNZBaseline | 0.8845 | 0.6768 | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.8228 | 6256.0928 |
| Item KNN cosine | 0.9788 | 0.7610 | 0.0000 | 0.0000 | 0.0000 | 0.9885 | 0.7204 | 6933.1300 |
| Item KNN msd | 0.9149 | 0.7031 | 0.0000 | 0.0000 | 0.0000 | 0.9918 | 0.7368 | 6969.8043 |
| Item KNN pearson | 0.9755 | 0.7550 | 0.0000 | 0.0000 | 0.0000 | 0.9984 | 0.7737 | 4432.8886 |
| Item KNN pearson_basline | 0.9248 | 0.7002 | 0.0000 | 0.0000 | 0.0000 | 0.9984 | 0.7734 | 4649.0240 |
| Item KNNZScore | 0.9155 | 0.6973 | 0.0016 | 0.0016 | 0.0003 | 1.0000 | 0.7944 | 6107.1179 |
| Item KNNMeans | 0.9087 | 0.6939 | 0.0000 | 0.0000 | 0.0000 | 1.0000 | 0.7878 | 5792.4466 |
| Item KNNZBaseline | 0.8931 | 0.6866 | 0.0246 | 0.0246 | 0.0094 | 0.9525 | 0.5054 | 4512.4656 |
| Random | 1.4206 | 1.1336 | 0.0180 | 0.0180 | 0.0072 | 1.0000 | 0.0522 | 854.7062 | 

Based upon the table below above, I would pick `Item KNNZBaseline` as my chosen metric. It has the lowest `RMSE` and the highest `HR`. However, I'm not a fan of the hight `Novelty` score. Even the `Random` algorithm picks, on average, more popular movies. Let's see the list of the top 10 recommendations:
```
Using recommender  Item KNNZBaseline

We recommend:
Awfully Big Adventure, An (1995) 4.925100503405479
What Happened Was... (1994) 4.925100503405479
Alice (2009) 4.869229031846505
Librarian: Quest for the Spear, The (2004) 4.840694735562784
Librarian, The: The Curse of the Judas Chalice (2008) 4.840694735562784
FairyTale: A True Story (1997) 4.82377448639196
Dead Like Me: Life After Death (2009) 4.82377448639196
Librarian: Return to King Solomon's Mines, The (2006) 4.799028068896117
Merlin (1998) 4.778319940937414
Last Legion, The (2007) 4.778319940937414
```
Unfortunately, I have no idea what any of these movies are. Furthermore, none of the other models recommended a better set of movies. Even random had a much better selection.

If I had to choose, I would pick the `KNNZBaseline` purely by the metrics. However, due to the top 10 movies, I would not use this algorithm. 

***

**[Next Section]](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/Notebook/6_Collaborative_Based_Recommenders.ipynb)**

***

**[Back to Top](#top)**

***

**[Back to Main](https://colab.research.google.com/github/villafue/Capstone_2_MovieLens/blob/main/MovieLens.ipynb)**

***