# Recommendations Systems
## Assignment 2: Content Based Recommendations 

**By:**  
Group #16

<br><br>

**The goal of this assignment is to:**
- Understand the details of content based recommender systems
- Understand pros & cons comparing to other recommender system approaches
- Practice recommender system training and evaluation.

**Instructions:**
- Students will form teams of two people each, and submit a single homework for each team.
- The same score for the homework will be given to each member of the team.
- Your solution in the form of an Jupyter notebook file (with extension ipynb).
- Images/Graphs/Tables should be submitted inside the notebook.
- The notebook should be runnable and properly documented. 
- Please answer all the questions and include all your code.
- English only.

**Submission:**
- Submission of the homework will be done via Moodle by uploading a Jupyter notbook.
- The homwork needs to be entirely in English.
- The deadline for submission is on Moodle.

**Requirements:**  
- Python 3.6+ and up should be used.  
- You should implement the recommender system by yourself using only basic Python libraries (such as numpy).


**Grading:**
- Q1 - 10 points - Data exploration
- Q2 - 30 points - Item similarity
- Q3 - 40 points - Content based recommendation  
- Q4 - 20 points - Content based vs. matrix factorization comparison

`Total: 100`

**Prerequisites**

In [None]:
!pip install --quiet zipfile36
!pip install --quiet wordcloud

**Imports**

In [None]:
# general
import time
import random
import zipfile
import requests
import warnings

# ml
import numpy as np
import scipy as sp
import pandas as pd

# visual
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.ticker as tck
from wordcloud import WordCloud, STOPWORDS

# metrics - do not use these metrics directly except for validating your work
from sklearn.metrics import mean_squared_error,ndcg_score 

# distance
from scipy.spatial.distance import cosine
from sklearn.metrics.pairwise import cosine_similarity

# notebook
from IPython.display import FileLink, display

**Hide Warnings**

In [None]:
warnings.filterwarnings('ignore')

**Disable Autoscrolling**

In [None]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) {
    return false;
}

<IPython.core.display.Javascript object>

**Set Random Seed**

In [None]:
random.seed(123)

# Question 1: 
# Data exploration 

In this exercise you will use the same dataset that we used for exercise #1:  [MovieLens 100K rating dataset](https://grouplens.org/datasets/movielens/100k/)  use Fold #2

Include additional exploration that is relevant for content based recommendation:

Explore at least 3 features.  
For example, explore the genres, titles, and you may also use [IMDB's API](https://developer.imdb.com/documentation/api-documentation/) to include additional features.  
 
Use plots and discuss your insights and possible challenges related to the dataset.


<br><br><br><br><br><br>
# Question 2:
# Item Similarity

The following blog posts [link1](https://medium.com/@bindhubalu/content-based-recommender-system-4db1b3de03e7), [link2](https://towardsdatascience.com/movie-recommendation-system-based-on-movielens-ef0df580cd0e), will be helpful for answering questions 2 and 3. 

Please provide code and explanations for your answer.  
In case you don't have a clear answer, please provide your best hypothesis.

### Build a movie profile vector based on the item features of your choice. 
Select at least two features from the dataset. Describe the potential contribution of each feature to a content based model.

Tip: When looking on the MovieLens dataset its features vector will include `genres`, `title`, etc.

In [None]:
# example
# irrelevant_cols = []
# movies_profile = df_items.drop(irrelevant_cols,axis=1)



### Build a function which provides the 5 most similar items to an input item (item_id). 
Please use `Cosine Similarity` metric to calculate Item to item similarity.  
Notice: while it is a great way to debug your code, make sure not to return the actual input `item_id` in your results.


In [None]:
def get_similar_items(..., item_id, n=5):
    ''' 
    item_id : target item 
    n : number of similar items to return
    
    This function returns a dataframe\array with ids of n most similar items to the target item and the similarity score
    '''
    
    
    return most_similar_items_id, most_similar_item_score

### Use the above function to find the 5 most similar items for any 2 items from the dataset. 
Please discuss the results you got. Are there any issues with the quality of the results? 

Please add the movie's title and image to your explanation.

In [None]:
# first

In [None]:
# second

<br><br><br><br><br><br>
# Question 3:
# Content based recommendation



### Build a function which recommends n=5 most relevant items to a user. 

In [None]:
def get_item_recommendations(..., user_id, n=5):
    ''' 
    user_id: id of target user
    n : number of recommended items
    
    This function returns a dataframe\array with ids of n recommended items and their scores
    '''


    return recommended_items_ids,recommended_items_score
    

### Test your recommender system on 2 users. Explain your results.

In [None]:
# TIP: try using code similar to the following for plotting your results
# user_id = 9  
# items, score = get_item_recommendations(users_profile,movies_profile,user_id)  

# fig, axes = plt.subplots(ncols=2,figsize=(16,4))  
# users_profile.loc[user_id].plot(kind='bar',ax=axes[0]);  
# movies_profile.iloc[items].T.plot(kind='bar',ax=axes[1]);  
# plt.legend(df_items.loc[df_items.index.isin(items)].movie_title);

### Use the MRR metric to evaluate your recommender system on the test set. 
Use a cutoff value of 5.

<br><br><br><br><br>
# Question 4
# Content based recommendations vs. Matrix Factorization

### Use MF's  item representation to find the most similar items
Use the matrix factorization item representation you built in exercise 1 to find the most similar items for **the same 2 items** you used above. 
(Use your optimal hyperparams and resulting model)

Compare the results you got using the different methods below and discuss your findings

In [None]:
# we recommend creating - class MF

In [None]:
# find similar items to the same two items you chose
pass

### Use MF implementation for item recommendations 

Use the matrix factorization implemention from exercise 1 to recommend 5 items to **the same 2 users** you used above. 
(Use your optimal hyperparams and resulting model)

Compare the results you got using the different methods. Discuss your findings

### Compare the results of the content based recommender system to the matrix factorization recommender system

- Please use the same train and test set. 
- Please use the MRR metric for the comparison (provide a comparison plot). 

### Advantages & Disadvantages 

Please use the following table to discuss the advantages and disadvantages of matrix factorization vs. content based recommender systems.
Please address the following aspects in your discussion, and feel free to add your own.

<table>
    <thead>
        <tr>
            <th>..</th>
            <th style="text-align:center">Content-Based</th>
            <th style="text-align:center">Matrix-Factorization</th>
            <th style="text-align:left">Notes:</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Dimensionality</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Similarity</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Accuracy</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Tranining Complexity</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Inference Complexity</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Explanability</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Scalability</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>        
        <tr>
            <td>New User</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>New Item</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Train Time</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Predict Time</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
        <tr>
            <td>Deterministic</td>
            <td style="text-align:center"></td>
            <td style="text-align:center"></td>
            <td style="text-align:left"></td>
        </tr>
    </tbody>
</table>

<br>

Good Luck :)