In [4]:
import numpy as np
import pandas as pd

Rate matrix (with <code>Nan</code> entries).

In [5]:
R = np.array([[5,3,4,4,np.nan], 
             [3,1,2,3,3],
             [4,3,4,3,5],
             [3,3,1,5,4],
             [1,5,5,2,1]])

Designate prediction: predict the rating of user $u$ to item $i$.

In [159]:
u = 0
i = 4

Calculate similarity between user $k$ and other users. <code>sim</code> $(R, k)= (Corr(u_k, u_i))_{i=0, i \neq k}^{\# \textrm{users}-1}$.

In [132]:
def corr(R, u = -1):
    import pandas as pd
    RD = pd.DataFrame(R.T)
    if u >= 0:
        S = RD.corr().values[:, u]
    else:
        S = RD.corr().values
    return S

In [138]:
sim = corr(R, 0)
sim

array([ 1.        ,  0.85280287,  0.70710678,  0.        , -0.79211803])

Consider users with positive correlation only (including the user under concern).

In [144]:
index = (sim > 0) * (sim != 1)
p_users = np.where(index)[u]
p_users

array([1, 2], dtype=int64)

Extract user similarity for prediction.

In [145]:
simp = sim[p_users]
simp

array([0.85280287, 0.70710678])

Calculate rate prediction for user $u_0$ to item $i$ by

$$ r_{ui}^* \triangleq \overline{r_u} + \frac{\sum_{v \in \textrm{p_users}} Corr(u,v)(r_{vi} - \overline{r_v})}{\left|\left| \textrm{p_users} \right|\right|} $$

In [149]:
m = np.nanmean(R, axis = 1)
pred = m[u] + np.sum([simp[i]*(R[p_users[i],4] - m[p_users[i]]) for i in range(len(p_users))])/np.sum(simp)


In [150]:
pred

4.871979899370592

# Complete Pipeline

In [1]:
def predict(u, i, R, t = 0, corr = False):
    assert not R[u, i] < np.inf, "User {} has already rated item {} with rating {}!".format(u, i, R[u,i])
    import pandas as pd
    # calculate (Nan-tolerant) correlation matrix
    SS = pd.DataFrame(R.T).corr().values
    # check which users have positive correlation with user u
    p = (SS[u, :] > t) * (SS[u, :] != 1)
    p_users = np.where(p)[u]
    # extract correlation between user u and all users
    S = SS[u, p_users]
    m = np.nanmean(R, axis = 1)
    np.sum(S*(R[p_users,i] - m[p_users]))
    pred = m[u] + np.sum(S*(R[p_users,i] - m[p_users]))/np.sum(S)
    if corr == False:
        return pred
    else:
        return pred, SS

In [6]:
predict(0, 4, R)

4.871979899370592