# Weighted sum with Pandas

I had some issues with doing basic weighted sum for item-item collaborative filtering on the Introduction to Recommender Systems course via Coursera. Let's learn this.

One was predictions how a user $u$ would rate item $i$, given a similarity function $sim(i, j)$, a neighbourhood of similar items $N$, and knowledge of the target user's rating $r_{uj}$ for those items $j \in N$.

$p_{ui} = \frac{\sum_{j \in N}sim(i, j)r_{uj}}{\sum_{j \in N} \left| sim(i, j) \right|}$

In [59]:
import pandas as pd
import scipy

Next, some data.

In [116]:
animals = ['kitten', 'puppy', 'giraffe', 'pokemon', 'snake']

ratings = pd.Series([5, 5, 10, 1, 5], index = animals)
weights = pd.Series([0.5, 0.9, 0.6, 0.1, 0.3], index = animals)

In [80]:
ratings * weights

kitten     2.5
puppy      4.5
giraffe    1.2
pokemon    0.1
snake      1.5
dtype: float64

In our scenario, the data is sparse, and while there are similarities between each pair of $i$ and $j$, the ratings matrix is sparse. The task is to predict the mission ratings. So let's similate that. For the neighbourhood $N$ let's just use all the other items.

In [119]:
#sparseratings = ratings.drop('pokemon')
sparseratings = ratings.copy()# .loc[5277]['giraffe']
sparseratings['giraffe'] = scipy.nan
sparseratings

kitten      5
puppy       5
giraffe   NaN
pokemon     1
snake       5
dtype: float64

Right. Let's walk throught this step-by-step to see how Pandas takes care of missing values.

In [120]:
sparseratings * weights

kitten     2.5
puppy      4.5
giraffe    NaN
pokemon    0.1
snake      1.5
dtype: float64

In [122]:
weights.sum()

2.3999999999999999

In [121]:
weights.filter(sparseratings.index)

kitten     0.5
puppy      0.9
giraffe    0.6
pokemon    0.1
snake      0.3
dtype: float64

Ok predict the removed values. To remind ourselves, they are:

In [123]:
for missing in ratings.loc[sparseratings.isnull()].index:
    print("{a} missings, with rating {r}, weight {w}".format(a = missing, r = ratings.loc[missing], w = weights.loc[missing]))

giraffe missings, with rating 10, weight 0.6


In [124]:
(sparseratings * weights.filter(sparseratings.index)).sum() / weights.filter(sparseratings.index).sum()

3.5833333333333335