# Collaborative filtering vs Content-based filtering

- Collaborative filtering  
Recomments items based on ratings of users who gave similar ratings as you (requires ratings)
- Content-based filtering:  
Recommends items to you based on features of user and item to find a good match (requires feature vector of a user)

Example of user features $x_{u}^{(i)}$ for user $j$
- Age
- Gender ($1$ hot encoding)
- Country ($1$ hot for $200$ countries)
- Movies watched (movies wathced, e.g., $1000$)
- Average rating per genre 

Example of movie features $x_{m}^{(i)}$ for user $i$
- Year
- Genre/Genres 
- Reviews (several features...)
- Average rating (per user/demographic...)

> Alborithm matches these two vectors

#### Consider two vectors:  
$v_{u}^{(j)}$ vector computed from $x_{u}^{(j)}$  
$v_{m}^{(i)}$ vector computed from $x_{m}^{(i)}$  
that represent $w^{(j)}\cdot c^{(i)} + b^{(j)}$ with $b^{(j)}=0$ (shown not to affect results) as $v_{u}^{(j)}\cdot v_{m}^{(i)}$, that says _how much a given user $j$ likes the movie $i$_. 

#### Computing the feature \& user vectors 

> Deep learning approach $x_{m}\rightarrow v_{m}$ with different networks for user and movie

Consider _user network_: $x_{u}\rightarrow 128 \rightarrow 64 \rightarrow 32 \rightarrow v_{u}$

Consider _movie network_: $x_{u}\rightarrow 256 \rightarrow 128 \rightarrow 32 \rightarrow v_{u}$

**Note** if $v_{u}$ and $v_m$ are binary, than we can consider sigmoid function and instead $v_{u}^{(j)}\cdot v_{m}^{(i)}$ calculate $g(v_{u}^{(j)}\cdot v_{m}^{(i)})$ to predict probability that $y^{(i,j)}=1$. 

Training the network with **cost function** as $J$ using a $\color{red}{\text{singular}}$ cost function for users and movies: 

$$
J(x^{(1)},...,x^{(n_m)}) = \frac{1}{2}\sum_{(i,j):r(i,j) = 1}(v_{u}^{(j)}\cdot v_{m}^{(i)} + 0 - y^{(i,j)})^2 + \text{(NN regularization term)}
$$

and train it with gradient descent. 

This network can also be used to find similar items.

After training we obtain 
- $v_{u}^{(j)}$, a vector of length $32$ that encodes user $j$ with features $x_{u}^{(j)}$
- $v_{m}^{(i)}$, a vector of length $32$ that encodes movie $i$ with features $x_{m}^{(i)}$

To find **simular movies** consider distance between vectors $||v_{m}^{(k)}-v_{m}^{(i)}||^2$ and find for what $k$ it is small.  
**Note** This can be pre-computed! So a new user, once he selecta a movie, get _recommendations_ for other, similar movies.  

> For this algorithm to work well, features must be carefully engeneered

**Limitaitons**: computationally expensive when number of movies is very large.

### Recomendations from large catalogue

In order to avoid recomputing the entire network for every new user, which is __computationally very expensive__ for large number of items consider: 
> Retrieval & Ranking

**Retrieval step** (ensure broad coverage)  
- Generate large list of plausable item candidates: 
1. for each of the last 10 movies watched by the user find 10 most similar movies vis $||v_{m}^{(k)}-v_{m}^{(i)}||^2$ which can be pre-computed. End executed via __look-up table__. 
2. for most viewed 3 generes, find the top 10 movies 
3. Top 20 movies in the country... 

- Combine retrieved items into list, removing duplicates and items already watched / purchased

**Note**: Retrieving more items leads to __better performace__ but __slower speed__.  
To analyze the trade-off; consider __offline__ experiments to find whether retrieving additional items leads to more relevant recomendations ($p^{(i,j)}=1$ of items desplayed to the user are higher). 

**Ranking step** (get the best option)
- Take the list of the learned movies and rank them using the learned model (feed $x_u$ and $x_m$ to the neural network). **Note** $x_m\rightarrow v_m$ can be pre-cpmputed in advance, and only $x_u\rightarrow v_u$ would need to be computed 
- Display ranked items to the user

### Tesorflow implementaitons

Implement two networks with dense layers.  
