# MovieLens Recommendation Algorithms Comparison

This notebook compares the performance of various recommendation algorithms using the MovieLens 100K dataset. The goal is to identify the optimal algorithm by minimizing the mean squared error using cross-validation.

The [MovieLens 100K](https://grouplens.org/datasets/movielens/100k/) dataset contains 100,000 ratings (1-5) from 943 users on 1682 movies. Each user has rated at least 20 movies.


In [6]:
import pandas as pd
import plotly.express as px
from surprise import Dataset, KNNBasic, SVD, NMF, SlopeOne, CoClustering
from surprise.model_selection import cross_validate

data = Dataset.load_builtin('ml-100k')

In [3]:
algorithms = {
    'KNNBasic': KNNBasic(),
    'SVD': SVD(),
    'NMF': NMF(),
    'SlopeOne': SlopeOne(),
    'CoClustering': CoClustering()
}

In [5]:
# We will use 5-fold cross-validation to evaluate the performance of each algorithm.

results = {}

for name, algorithm in algorithms.items():
    cv_results = cross_validate(algorithm, data, measures=['rmse'], cv=5, verbose=True)
    results[name] = cv_results

summary = pd.DataFrame({name: cv_results['test_rmse'] for name, cv_results in results.items()})
summary.loc['mean'] = summary.mean()
summary.reset_index(inplace=True)
summary = summary.rename(columns={'index': 'fold'})
summary

Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Computing the msd similarity matrix...
Done computing similarity matrix.
Evaluating RMSE of algorithm KNNBasic on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9754  0.9723  0.9852  0.9793  0.9812  0.9787  0.0045  
Fit time          0.06    0.06    0.06    0.06    0.06    0.06    0.00    
Test time         0.70    0.73    0.68    0.69    0.67    0.69    0.02    
Evaluating RMSE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9314  0.9390  0.9359  0.9364  0.9376  0.9361  0.0026  
Fit time          0.28    0.32    0.33    0.31    0.30    0.31    0.02    
Test ti

Unnamed: 0,fold,KNNBasic,SVD,NMF,SlopeOne,CoClustering
0,0,0.975373,0.931399,0.967115,0.945785,0.967326
1,1,0.972277,0.938999,0.960477,0.951723,0.962688
2,2,0.985215,0.935938,0.964774,0.945533,0.952728
3,3,0.979297,0.936424,0.962653,0.944714,0.978392
4,4,0.981179,0.937568,0.962425,0.932577,0.966609
5,mean,0.978668,0.936066,0.963489,0.944066,0.965549


In [8]:
summary_long = summary.melt(id_vars=['fold'], var_name='Algorithm', value_name='RMSE')

fig = px.bar(summary_long, x='fold', y='RMSE', color='Algorithm', barmode='group',
             title='RMSE of Different Algorithms across Folds',
             labels={'fold': 'Fold', 'RMSE': 'Root Mean Squared Error'},
             height=600)
fig.update_layout(xaxis={'categoryorder':'category ascending'})
fig.show()