# Movie Recommendations using Collaborative Filtering

In this kernel we'll be building a baseline Movie Recommendation System using The Movie Dataset on kaggle. For novices like me this kernel will pretty much serve as a foundation in recommendation systems and will provide you with something to start with.

In this kernel we would be using collaborative filtering to recommend movies.

Our content based engine suffers from some **severe limitations**. It is only capable of suggesting movies which are close to a certain movie. It is not capable of capturing tastes and providing recommendations across genres.

Also, the engine that we built is not really personal. That is it doesn't capture the personal tastes and biases of a user. Anyone querying our engine for recommendations based on a movie will receive the same recommendations for that movie, regardless of who she/he is.

Therefore, in this section, we will use a technique called Collaborative Filtering to make recommendations to Movie Watchers. It is basically of **two types** :-

1] **User based filtering**- These systems recommend products to a user that similar users have liked. For measuring the similarity between two users we can either use pearson correlation or cosine similarity. This filtering technique can be illustrated with an example. In the following matrixes, each row represents a user, while the columns correspond to different movies except the last one which records the similarity between that user and the target user. Each cell represents the rating that the user gives to that movie. Assume user E is the target.

![image](./images/user_similarity_1.png)

Since user A and F do not share any movie ratings in common with user E, their similarities with user E are not defined in Pearson Correlation. Therefore, we only need to consider user B, C, and D. Based on Pearson Correlation, we can compute the following similarity.

![image](./images/user_similarity_2.png)

From the above table we can see that user D is very different from user E as the Pearson Correlation between them is negative. He rated Me Before You higher than his rating average, while user E did the opposite. Now, we can start to fill in the blank for the movies that user E has not rated based on other users.

![image](./images/user_similarity_3.png)

Although computing user-based Collaborative Filtering is very simple, it suffers from several problems. One main issue is that user preference can change over time. It indicates that precomputing the matrix based on their neighboring users may lead to bad performance. To tackle this problem, we can apply item-based Collaborative Filtering.

2] **Item Based Collaborative Filtering**- Instead of measuring the similarity between users, the item-based Collaborative Filtering recommends items based on their similarity with the items that the target user rated. Likewise, the similarity can be computed with Pearson Correlation or Cosine Similarity. The major difference is that, with item-based collaborative filtering, we fill in the blank vertically, as oppose to the horizontal manner that user-based Collaborative Filtering does. The following table shows how to do so for the movie Me Before You.

![image](./images/item_similarity.png)

It successfully avoids the problem posed by dynamic user preference as item-based Collaborative Filtering is more static. However, several problems remain for this method. First, the main issue is scalability. The computation grows with both the customer and the product. The worst case complexity is O(mn) with m users and n items. In addition, sparsity is another concern. Take a look at the above table again. Although there is only one user that rated both Matrix and Titanic rated, the similarity between them is 1. In extreme cases, we can have millions of users and the similarity between two fairly different movies could be very high simply because they have similar rank for the only user who ranked them both.

### Single Value Decomposition

One way to handle the scalability and sparsity issue created by Collaborative Filtering is to leverage a latent factor model to capture the similarity between users and items. Essentially, we want to turn the recommendation problem into an optimization problem. We can view it as how good we are in predicting the rating for items given a user. One common metric is Root Mean Square Error (RMSE). The lower the RMSE, the better the performance.

Latent factor is a broad idea which describes a property or concept that a user or an item have. For instance, for music, latent factor can refer to the genre that the music belongs to. SVD decreases the dimension of the utility matrix by extracting its latent factors. Essentially, we map each user and each item into a latent space with dimension r. Therefore, it helps us better understand the relationship between users and items as they become directly comparable. The below figure illustrates this idea.

![image](./images/single_value_decomposition.png)

Since the dataset we used for content-based filtering and demographic filtering did not have userId(which is necessary for collaborative filtering), we will load another dataset. We'll be using the Surprise library to implement SVD.

In [8]:
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate, KFold
import pandas as pd
import numpy as np

reader = Reader()
ratings = pd.read_csv('./Dataset/user_id_dataset/ratings_small.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


**Note:- In this dataset movies are rated on a scale of 5 unlike the earlier one.**

In [9]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)
fold = KFold(n_splits=5)
fold.split(data)

<generator object KFold.split at 0x000001A53317BE48>

In [10]:
svd = SVD()
cross_validate(svd, data, measures=['RMSE', 'MAE'])

{'test_rmse': array([0.89714126, 0.89590676, 0.89985786, 0.90226862, 0.89262394]),
 'test_mae': array([0.68806288, 0.69051325, 0.6940697 , 0.6942201 , 0.68838801]),
 'fit_time': (4.88942813873291,
  4.96440577507019,
  5.006830453872681,
  4.883449077606201,
  4.9049577713012695),
 'test_time': (0.12968087196350098,
  0.11611533164978027,
  0.12536025047302246,
  0.11700153350830078,
  0.12296676635742188)}

We get a mean **Root Mean Sqaure Error of 0.89 approx** which is more than good enough for our case. Let us now train on our dataset and arrive at predictions.

In [11]:
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1a531db5848>

Let us pick user with user Id 1 and check the ratings she/he has given.

In [12]:
ratings[ratings['userId'] == 1]

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
5,1,1263,2.0,1260759151
6,1,1287,2.0,1260759187
7,1,1293,2.0,1260759148
8,1,1339,3.5,1260759125
9,1,1343,2.0,1260759131


In [14]:
ratings[ratings['movieId'] == 302]

Unnamed: 0,userId,movieId,rating,timestamp
13375,86,302,3.0,848161134
31298,224,302,4.0,828214012
54695,391,302,4.0,891533396
55143,396,302,4.0,834999154
73529,514,302,3.0,853893761
82993,564,302,4.0,974841654
91822,608,302,4.0,939461651


In [13]:
svd.predict(1, 302, 3)

Prediction(uid=1, iid=302, r_ui=3, est=2.7538629114476723, details={'was_impossible': False})

For movie with ID 302, we get an estimated prediction of **2.753**. One startling feature of this recommender system is that it doesn't care what the movie is (or what it contains). It works purely on the basis of an assigned movie ID and tries to predict ratings based on how the other users have predicted the movie.

## Conclusion

We created recommenders using demographic , content- based and collaborative filtering. While demographic filtering is very elemantary and cannot be used practically, **Hybrid Systems** can take advantage of content-based and collaborative filtering as the two approaches are proved to be almost complimentary. This model was very baseline and only provides a fundamental framework to start with.