## __Model-Based Collaborative Filtering__ ##
Let's understand the collaborative filtering in the model.

## Step 1: Import Required Libraries

- Import pandas and numpy

In [None]:
import pandas as pd
import numpy as np

## Step 2: Load and Inspect the Data

- Load the dataset with the given header
- Print the head of the DataFrame
- Calculate the number of unique users and items

In [None]:
header =['user_id', 'item_id', 'rating', 'timestamp']
df  = pd.read_csv('u.data', sep='\t', names=header)
df.head()

__Observations:__
- Here, we can see a few rows of the dataset.
- The data contains user_id, item_id, rating, and timestamp.

In [None]:
n_users = df.user_id.unique().shape[0]
n_items = df.item_id.unique().shape[0]

__Observation:__
- Here, we have created n users and n items.

In [None]:
n_users,n_items

## Step 3: Split the Data into Train and Test Sets

- Import **train_test_split**
- Split the data into **train_data** and **test_data**


In [None]:
from sklearn.model_selection import train_test_split
train_data, test_data = train_test_split(df, test_size=0.25)

__Observation:__
- Here, we have split the data into train and test sets.

## Step 4: Create a Matrix for Train and Test Data

- Initialize **train_data_mat** and **test_data_mat** with zeros
- Fill the matrices with the corresponding ratings


In [None]:
train_data_mat = np.zeros((n_users, n_items))
for line in train_data.itertuples():
    train_data_mat[line[1]-1, line[2]-1] = line[3]
                      
test_data_mat = np.zeros((n_users, n_items))
for line in test_data.itertuples():
    test_data_mat[line[1]-1, line[2]-1] = line[3]                       

## Step 5: Define the RMSE Function

- Import mean_squared_error
- Define the rmse function to calculate the root mean squared error


In [None]:
train_data_mat

In [None]:
from sklearn.metrics import mean_squared_error
from math import sqrt

def rmse(prediction, ground_truth):
    prediction = prediction[ground_truth.nonzero()].flatten()
    ground_truth = ground_truth[ground_truth.nonzero()].flatten()
    return sqrt(mean_squared_error(prediction, ground_truth))

## Step 6: Check the Sparsity for the Dataset

- Calculate the sparsity of the **MovieLens100K** dataset


In [None]:
sparsity = round(1.0-len(df)/float(n_users*n_items), 3)
print('The sparsity level of MovieLens100K is ' + str(sparsity*100) + '%')

__Observation:__
- As shown, the sparsity level of **MovieLens100K** is 93.7%.

## Step 7: Apply SVD and Calculate RMSE

- Import svds
- Apply SVD to the **train_data_mat** and choose k
- Calculate the prediction matrix **X_pred**
- Calculate the RMSE between **X_pred** and **test_data_mat**


In [None]:
import scipy.sparse as sp
from scipy.sparse.linalg import svds

u, s, vt = svds(train_data_mat,k=20)
s_diag_matrix=np.diag(s)
X_pred = np.dot(np.dot(u, s_diag_matrix), vt) # Generate
print('User-based CF MSE: ' + str(rmse(X_pred, test_data_mat)))

__Observation:__
- Here, we have calculated the RMSE between X_pred and test_data_mat, which is 2.71.