# Background
This is based on ideas from this [post](http://www.albertauyeung.com/post/python-matrix-factorization/), but takes a more simple, script-like approach

# Setup

Data & Algebra libs

In [None]:
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances

Graphic libs

In [None]:
from matplotlib import pyplot as plt
%matplotlib inline
import seaborn as sns

# Data

## About the data|
You can find everything you need to know about the MovieLens dataset [here](http://files.grouplens.org/datasets/movielens/ml-100k/README)

## Read data

In [None]:
ml = \
pd.read_csv(
    filepath_or_buffer = 'http://files.grouplens.org/datasets/movielens/ml-100k/u.data',
    header = None,
    sep = '\t'
)

In [None]:
ml.columns = 'user id | item id | rating | timestamp'.split(' | ')

For offline usage

In [None]:
# ml.to_feather('movie_rec.feather')
# ml = pd.read_feather('movie_rec.feather')

Or dataset at a glance

In [None]:
ml.shape

In [None]:
ml.head()

In [None]:
ml.describe()

## Matrix form

In [None]:
data_mat = \
ml.drop(columns=['timestamp']).\
    pivot(
        index = 'user id', 
        columns = 'item id', 
        values = 'rating'
    ).\
    fillna(0)

# User recommendations

## Product similarity
This approach looks at all the products recommended by the users and try to find "similar" products based on some simple rules.

In [None]:
product_similarity = pairwise_distances(X = data_mat.T, metric = 'cosine')

In [None]:
product_similarity_vec = \
pd.Series(
    product_similarity[
        np.tril_indices_from(product_similarity)
    ]
)

In [None]:
print(
    'have have',
    (product_similarity_vec == 0).sum(),
    'identical products (with distance = 0), and',
    (product_similarity_vec == 1).sum(),
    'with no common ratings (distance = 1)'
)

In [None]:
print(product_similarity_vec.describe())

In [None]:
sns.distplot(
    product_similarity_vec,
    bins = 30,
    kde = False,
    norm_hist = False,
)

There are several strategies for picking the products we want to recommend based on product similarity, but essentially we need to make several choices
1. What products we look at: all the products the user rated? Only producs with $n$ stars or more? Or do we weights the distances by the number of stars? And even with weights - do 1 star reviews have a positive weight or shoud we give a low number of stars a negative weight? 
2. How do we aggregate the distances? Say the user only rated 2 products, and both of them are similar to product $j$. Do we rank product $j$ compared to all the other producuts that have some similarity to the 2 rated products? Do we sum up distances for each product and rank? Do we take $min$/$max$/... distance for each product?
3. What would be the threshold for a product to be recommended? This is dependent on practical limitaions, e.g. what's the minimum/maximim number of products we have to/can recommend, and what is the distribution of distances and other considerations (e.g. do we want an absolute threshold of similarity and not just the top 5)

Looking at the second choice we have to make: there's one huge advantage to summing up distances over picking min/max or other aggregation methods - __we can use matrix multiplication!__

### Example 1
We decide to look at all the movies a user rated (no matter what the rating was) and sum up the similarities to other products, and recommend the top 5 most similar products (regardless of the absolute level of similarity - in this example we _have_ to recommend exactly 5 products, no matter what)

We define 
1. The symmetric matrix $P$ containing prodcut to product distances as we calculated above (our `product_similaroty` matrix based on `cosine` distance)
2. A matrix with binary values $C$

\begin{equation*}
    C_{I \times J} = \left( \left( c_{i,j} \right) \right)_{I \times J}
\end{equation*}
where
\begin{equation*}
    c_{i,j} = 
    \left[
        \begin{array}{ll}
          1 & R_{i,j} > 0 \\
          0 & \text{otherwise}
        \end{array}
    \right.
\end{equation*}

For every user $i$ we want to look at all rated products, and find the distances from other products. This can be written as a sparse matrix, where only rows matching rated products will have non-zero values:

\begin{equation*}
   \forall i,  \left( \left( c_{i,j} p_{j,k} \right) \right)_{J \times J}
\end{equation*}

And if our way to rank products is to sum up distances, we can sum up across columns, to get $\left( \sum_{j} C_{i,j} P_{j,1} , \ldots, \sum_{j}  C_{i,j} P_{j,J} \right)$, which is simply the $i$'th row of the matrix $CP$, which we can calculate as:

In [None]:
CP = (data_mat > 0).dot(product_similarity)
CP.columns = data_mat.columns

For practical reasons we need to work with the long format, sorted by similarity.

In [None]:
user_rec = pd.DataFrame({'dist': CP.T.unstack()})

And now we can pull out only the top 5 most similar products (or apply and other selection rule)

In [None]:
user_rec.\
groupby(user_rec.index.get_level_values(0)).\
apply(
    lambda x: x.sort_values(by = 'dist').head(5).reset_index()
).head(15)

## Similar users => Products

In [None]:
user_similarity = pairwise_distances(X = data_mat, metric = 'cosine')

In [None]:
print(pd.Series(user_similarity.flatten()).describe())

In [None]:
sns.distplot(
    user_similarity[
        np.triu_indices(
            user_similarity.shape[0], 
            k=1
        )
    ],
    bins = 30,
    kde = False
)

For rach user we pick the top $n$ most similar users and select the products they liked most 

# Collaborative Filtering

## Parameters

In [None]:
K = 3
R = np.array(data_mat)
iterations = 1000
tolerance = 0.05
alpha = 0.0001

## Simple Gradient Descent

### Initialize

In [None]:
P = np.random.normal(scale=1./K, size=(R.shape[0], K))
Q = np.random.normal(scale=1./K, size=(K, R.shape[1]))

In [None]:
training_process = []
t = 1
mse_prev = 0

### Gradient Descent

In [None]:
while t <= iterations and abs(mse - mse_prev) > tolerance:
    mse_prev = mse
    
    ## Error matrix
    error = R - P.dot(Q)

    ## Update down the gradient
    P = P + 2 * alpha * error.dot(Q.T)
    Q = Q + 2 * alpha * P.T.dot(error)

    mse = np.power(error, 2).sum()
    training_process.append((t, mse))

    if (t+1) % 50 == 0:
        print("Iteration: %d ; error = %.4f" % (t+1, mse))
    t += 1

In [None]:
sns.pointplot(
    data = pd.DataFrame(training_process, columns=['iteration', 'mse']), 
    x = 'iteration', 
    y = 'mse')

## Regularisation

Bias terms

In [None]:
b_u = np.zeros(data_mat.shape[0])
b_i = np.zeros(data_mat.shape[1])
b = np.mean(R[np.where(R != 0)])

## Stochastic GD

Training samples

In [None]:
samples = [
    (i, j, R[i, j])
    for i in range(R.shape[0])
    for j in range(R.shape[1])
    if R[i, j] > 0
]