# Baseline Recommender Algorithms

Following are simple methods which can be used as baselines for comparing recommation techniques. Recommendation techniques in general should perform better (indeed much better) then these baseline techniques

In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from scipy.sparse import csr_matrix
import copy

### Movielens ml-latest-small dataset

In [7]:
ratings = pd.read_csv("../../datasets/ml-latest-small/ratings.csv", sep=",")
ratings.head(10)

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.0,964982703
1,1,3,4.0,964981247
2,1,6,4.0,964982224
3,1,47,5.0,964983815
4,1,50,5.0,964982931
5,1,70,3.0,964982400
6,1,101,5.0,964980868
7,1,110,4.0,964982176
8,1,151,5.0,964984041
9,1,157,5.0,964984100


In [9]:
n_users = ratings.iloc[:,0].unique().size
n_items = ratings.iloc[:,1].unique().size
n_ratings = ratings.iloc[:,1].size
users = ratings.iloc[:,0].unique()
items = ratings.iloc[:,1].unique()

print("Number of users:",n_users)
print("Number of items:",n_items)
print("Number of ratings:",n_ratings)
print("Sparsity:",n_ratings/(n_users*n_items))

Number of users: 610
Number of items: 9724
Number of ratings: 100836
Sparsity: 0.016999683055613623


# Rating Prediction

### Random Prediction

Randomly makes a rating prediction in the range 1 to 5.

In [10]:
error = 0
for i in range(n_ratings): 
    error += np.abs(ratings.iloc[i,2] - np.random.randint(1,6))
print(error/n_ratings)

1.4867309294299655


### Average Prediction

Always makes the same prediction which is the average rating of all users. This can be improved as follows: For a given $(u,i)$ pair, make the prediction as the average rating of user $u$ or the average rating of item $i$.

In [11]:
X_train, X_test = train_test_split(ratings, test_size=0.1)
train_size = X_train.shape[0]
test_size = X_test.shape[0]

avg_rating = X_train.iloc[:,2].mean()
print("Avg. rating:",avg_rating)
error = 0
for i in range(test_size): 
    error += np.abs(X_test.iloc[i,2] - avg_rating)
print(error/test_size)

Avg. rating: 3.5008870327926656
0.8225716219300959


- 