## __Collaborative Filtering and Memory-Based Modeling__ #
Collaborative filtering is a technique that can filter items a user might like based on reactions by similar users. It is a recommendation engine.


## Step 1: Import Required Libraries and Load the Dataset

- Import the pandas and NumPy libraries
- Load the dataset using pandas


In [None]:
import pandas as pd
import numpy as np

In [None]:
header =['user_id', 'item_id', 'rating', 'timestamp']
df = pd.read_csv('u.data', sep='\t', names=header)
df.head()

Unnamed: 0,user_id,item_id,rating,timestamp
0,196,242,3,881250949
1,186,302,3,891717742
2,22,377,1,878887116
3,244,51,2,880606923
4,166,346,1,886397596


__Observations:__
- Here, we have defined the headers, as the user data has these columns.
- The data contains user_id, item_id, rating, and timestamp.

## Step 2: Create a N User

- Create an N user by taking unique values for the user and applying the same to the items




In [None]:
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]
print('number of user = ' + str(n_users) + ' | number of items = ' + str(n_items))

number of user = 943 | number of items = 1682


__Observation:__
- There are 943 users and 1682 items.

## Step 3: Split the Data into Train and Test Sets

- Import train_test_split from sklearn.model_selection
- Split the data into train and test sets


In [None]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(df, test_size=0.25)

## Step 4: Create a Matrix for Train and Test Sets

- Create user-item matrices

In [None]:
train_data_mat = np.zeros((n_users, n_items))
for line in train_data.itertuples():
    train_data_mat[line[1]-1, line[2]-1] = line[3]
                      
test_data_mat = np.zeros((n_users, n_items))
for line in test_data.itertuples():
    test_data_mat[line[1]-1, line[2]-1] = line[3]                       

__Observation:__
-  Here, we have created user-item matrices for train and test sets by comparing line items.


## Step 5: Calculate Similarity Matrices for Users and Items

- Import pairwise_distances from sklearn.metrics.pairwise
- Calculate similarity matrices for users and items


In [None]:
from sklearn.metrics.pairwise import pairwise_distances
user_sim = pairwise_distances(train_data_mat )
item_sim = pairwise_distances(train_data_mat.T)

## Step 6: Define the Prediction Function

- Define a `predict` function that takes the following parameters:
  - ratings: the user-item matrix
  - similarity: the similarity matrix
  - type (default = user): the type of collaborative filtering (user or item)

In [None]:
def predict(ratings, similarity, type='user'):
    if type == 'user':
        mean_user_rating = ratings.mean(axis=1)
        ratings_diff = (ratings - mean_user_rating[:, np.newaxis])
        pred = mean_user_rating[:, np.newaxis] + similarity.dot(ratings_diff) / np.array([np.abs(similarity).sum(axis=1)]).T
    elif type == 'item':
        pred = ratings.dot(similarity) / np.array([np.abs(similarity).sum(axis=1)])
    return pred

In [None]:
item_prediction = predict(train_data_mat, item_sim, type='item')
user_prediction = predict(train_data_mat, user_sim, type='user')

__Observations:__
- Item predictions and user predictions are saved.
- Though the memory algorithm is easy to implement, there are drawbacks, such as not scaling up to the real-world scenario and not addressing the well-known cold start problem.
- The problem with a cold start is that when a new user or a new item enters the system, they won’t be able to create a recommendation.

## Step 7: Create a Function for RMSE

- Import mean_squared_error from sklearn.metrics
- Define the RMSE function
- Calculate RMSE for user-based and item-based predictions


In [None]:
from sklearn.metrics import mean_squared_error
from math import sqrt

def rmse(prediction, ground_truth):
    prediction = prediction[ground_truth.nonzero()].flatten()
    ground_truth = ground_truth[ground_truth.nonzero()].flatten()
    return sqrt(mean_squared_error(prediction, ground_truth))

print('User-based CF RMSE: ' + str(rmse(user_prediction, test_data_mat)))
print('Item-based CF RMSE: ' + str(rmse(item_prediction, test_data_mat)))

User-based CF RMSE: 3.081909822553596
Item-based CF RMSE: 3.3624664562271245


__Observation:__
- As shown, we have calculated the RMSE for user-based and item-based predictions.


This is how we evaluate the recommendation called collaborative filtering with memory.