## What are recommender systems?

A recommender system is an information filtering tool used to predict a user's rating.

Example applications: 
1. E-commerce websites (recommending a product to buy based on previous purchase and "similar" buyer's further purchase items- Amazon),<br>
2. Streaming services that suggest movies/music (Spotify, Netflix)
3. Social media platforms proposing content creators to follow (Linkedin, YouTube)

## Types of recommender systems:
1. Content-Based Filtering : Recommends items based on user's history of likes/rating
2. Collaborative Filtering : Recommends items that user's with similar preferences liked.
3. Hybrid systems: Combine both content-based and collaborative filtering.

### 1. Content Based Filtering

Recommends an item B to user if user liked the item A and item B is similar to item A

In [1]:
import pandas as pd

In [2]:
movies_dict = {
    "movie_id":[200,201,202,203,204],
    "name": ["Movie A", "Movie B","Movie C", "Movie D", "Movie E"],
    "genre":["Action","Romance","Comedy","Action","Comedy"],
    "director": ["Director A", "Director B","Director A", "Director D", "Director B"],
    "actors":['Actor A|Actor B', 'Actor B|Actor C', 'Actor D|Actor E', 'Actor A|Actor F', "Actor A|Actor D"]
}

In [3]:
movies_df = pd.DataFrame (movies_dict)

In [None]:
movies_df

Unnamed: 0,movie_id,name,genre,director,actors
0,200,Movie A,Action,Director A,Actor A|Actor B
1,201,Movie B,Romance,Director B,Actor B|Actor C
2,202,Movie C,Comedy,Director A,Actor D|Actor E
3,203,Movie D,Action,Director D,Actor A|Actor F
4,204,Movie E,Comedy,Director B,Actor A|Actor D


In [5]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [6]:
movies_df["combined_features"] = movies_df["genre"] + " " + movies_df["director"] + " " + movies_df["actors"]

In [7]:
vectorizer=TfidfVectorizer()
tfidf_matrix=vectorizer.fit_transform(movies_df["combined_features"])

In [8]:
for i in range(tfidf_matrix.shape[0]):
    print(tfidf_matrix[i])
    break

<Compressed Sparse Row sparse matrix of dtype 'float64'
	with 3 stored elements and shape (1, 5)>
  Coords	Values
  (0, 0)	0.6036665474310993
  (0, 3)	0.3565351874675532
  (0, 1)	0.7130703749351064


In [9]:
from sklearn.metrics.pairwise import cosine_similarity

In [10]:
cosine_similarity?

[0;31mSignature:[0m [0mcosine_similarity[0m[0;34m([0m[0mX[0m[0;34m,[0m [0mY[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m [0mdense_output[0m[0;34m=[0m[0;32mTrue[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Compute cosine similarity between samples in X and Y.

Cosine similarity, or the cosine kernel, computes similarity as the
normalized dot product of X and Y:

    K(X, Y) = <X, Y> / (||X||*||Y||)

On L2-normalized data, this function is equivalent to linear_kernel.

Read more in the :ref:`User Guide <cosine_similarity>`.

Parameters
----------
X : {array-like, sparse matrix} of shape (n_samples_X, n_features)
    Input data.

Y : {array-like, sparse matrix} of shape (n_samples_Y, n_features),             default=None
    Input data. If ``None``, the output will be the pairwise
    similarities between all samples in ``X``.

dense_output : bool, default=True
    Whether to return dense output even when the input is sparse. If
    ``False``, the output i

In [11]:
cosine_sim=cosine_similarity(tfidf_matrix)

In [None]:
cosine_sim

array([[1.        , 0.58131574, 0.6355867 , 1.        , 0.6355867 ],
       [0.58131574, 1.        , 0.58131574, 0.58131574, 0.58131574],
       [0.6355867 , 0.58131574, 1.        , 0.6355867 , 1.        ],
       [1.        , 0.58131574, 0.6355867 , 1.        , 0.6355867 ],
       [0.6355867 , 0.58131574, 1.        , 0.6355867 , 1.        ]])

In [13]:
cosine_sim_df = pd.DataFrame(data= cosine_sim, index= movies_df["name"], columns=movies_df["name"])

In [14]:
cosine_sim_df

name,Movie A,Movie B,Movie C,Movie D,Movie E
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Movie A,1.0,0.581316,0.635587,1.0,0.635587
Movie B,0.581316,1.0,0.581316,0.581316,0.581316
Movie C,0.635587,0.581316,1.0,0.635587,1.0
Movie D,1.0,0.581316,0.635587,1.0,0.635587
Movie E,0.635587,0.581316,1.0,0.635587,1.0


In [16]:
cosine_sim_df > 0.6

name,Movie A,Movie B,Movie C,Movie D,Movie E
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Movie A,True,False,True,True,True
Movie B,False,True,False,False,False
Movie C,True,False,True,True,True
Movie D,True,False,True,True,True
Movie E,True,False,True,True,True
