### Building a Recommender system with Surprise

This try-it focuses on exploring additional algorithms with the `Suprise` library to generate recommendations.  Your goal is to identify the optimal algorithm by minimizing the mean squared error using cross validation. You are also going to select a dataset to use from [grouplens](https://grouplens.org/datasets/movielens/) example datasets.  

To begin, head over to grouplens and examine the different datasets available.  Choose one so that it is easy to create the data as expected in `Surprise` with user, item, and rating information.  Then, compare the performance of at least the `KNNBasic`, `SVD`, `NMF`, `SlopeOne`, and `CoClustering` algorithms to build your recommendations.  For more information on the algorithms see the documentation for the algorithm package [here](https://surprise.readthedocs.io/en/stable/prediction_algorithms_package.html).

Share the results of your investigation and include the results of your cross validation and a basic description of your dataset with your peers.



In [10]:
from surprise import Dataset, SVD, SVDpp, NMF, KNNBasic, KNNWithMeans, KNNWithZScore, SlopeOne, CoClustering, BaselineOnly, NormalPredictor
from surprise.model_selection import cross_validate
import pandas as pd

In [2]:
# Load the movielens-100k dataset (download it if needed),
data = Dataset.load_builtin("ml-100k")

# We'll use the famous SVD algorithm.
algos = [SVD(), SVDpp(), NMF(), KNNBasic(), KNNWithMeans(), KNNWithZScore(), SlopeOne(), CoClustering(), BaselineOnly()]

# Run 5-fold cross-validation and print results
for algo in algos:
    cross_validate(algo, data, measures=["RMSE", "MAE"], cv=5, verbose=True)


Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9346  0.9384  0.9362  0.9410  0.9366  0.9373  0.0022  
MAE (testset)     0.7358  0.7398  0.7357  0.7419  0.7375  0.7381  0.0024  
Fit time          10.14   4.73    3.32    3.78    4.07    5.21    2.51    
Test time         0.90    0.67    0.87    0.75    0.75    0.79    0.08    
Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9227  0.9142  0.9143  0.9224  0.9113  0.9170  0.0047  
MAE (testset)     0.7222  0.7176  0.7184  0.7217  0.7153  0.7190  0.0026  
Fit time          97.41   46.83   46.75   47.95   47.65   57.32   20.05   
Test time         10.38   8.71    10.10   10.15   8.64    9.59    0.76    
Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (tests

In [12]:
cross_validate(NormalPredictor(), data, measures=['RMSE'], cv=5, verbose=True)

Evaluating RMSE of algorithm NormalPredictor on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.5230  1.5179  1.5239  1.5150  1.5240  1.5208  0.0037  
Fit time          0.12    0.18    0.16    0.17    0.16    0.16    0.02    
Test time         0.16    0.67    0.15    0.32    0.14    0.29    0.20    


{'test_rmse': array([1.52298734, 1.51791009, 1.52392043, 1.51500795, 1.52403469]),
 'fit_time': (0.12103509902954102,
  0.1755509376525879,
  0.16237306594848633,
  0.16876888275146484,
  0.15611791610717773),
 'test_time': (0.15979790687561035,
  0.666395902633667,
  0.1454458236694336,
  0.31672120094299316,
  0.14028406143188477)}

In [13]:
pd.DataFrame(
    {'Algorithms':['SVD', 'SVDpp', 'NMF', 'KNNBasic', 'KNNWithMeans', 'KNNWithZScore', 'SlopeOne', 'CoClustering', 'BaselineOnly', 'NormalPredictor'],
    'Mean RMSE':[.9373, .9170, .9627, .9800, .9512, .9496, .9447, .9648, .9439, 1.5208],
    'Avg. Fit Time': [5.21, 57.32, 5.42, .96, 1.07, 1.35, 1.87, 4.09, .61, .16],
    'Avg. Test Time': [.79, 9.59, .46, 8.39, 8.83, 9.38, 5.8, .36, .31, .29]
     }
).set_index('Algorithms').sort_values('Mean RMSE')

Unnamed: 0_level_0,Mean RMSE,Avg. Fit Time,Avg. Test Time
Algorithms,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
SVDpp,0.917,57.32,9.59
SVD,0.9373,5.21,0.79
BaselineOnly,0.9439,0.61,0.31
SlopeOne,0.9447,1.87,5.8
KNNWithZScore,0.9496,1.35,9.38
KNNWithMeans,0.9512,1.07,8.83
NMF,0.9627,5.42,0.46
CoClustering,0.9648,4.09,0.36
KNNBasic,0.98,0.96,8.39
NormalPredictor,1.5208,0.16,0.29


In [14]:
model = SVDpp()
cross_validate(model, data, measures=['RMSE'], cv=5, verbose=True)

Evaluating RMSE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9224  0.9262  0.9260  0.9121  0.9079  0.9189  0.0075  
Fit time          30.44   26.62   29.05   29.97   25.39   28.29   1.96    
Test time         5.01    4.90    5.17    5.31    6.11    5.30    0.43    


{'test_rmse': array([0.92239569, 0.92622098, 0.92597743, 0.91207983, 0.90791844]),
 'fit_time': (30.444926977157593,
  26.62021827697754,
  29.052200078964233,
  29.97010612487793,
  25.385390281677246),
 'test_time': (5.010303020477295,
  4.904877662658691,
  5.167024850845337,
  5.3132219314575195,
  6.1090171337127686)}

In [18]:
model.pu @ model.qi.T # All model predictions

array([[ 0.09293868,  0.16079092,  0.13617895, ...,  0.11561578,
        -0.06189626,  0.04754225],
       [-0.0048637 , -0.16170747, -0.2044498 , ..., -0.16417532,
        -0.04711465,  0.0431397 ],
       [ 0.01202486, -0.21278275,  0.14290815, ..., -0.02915404,
         0.05851256, -0.02504775],
       ...,
       [-0.18503981, -0.0806711 , -0.13905852, ...,  0.00092375,
         0.00471877, -0.02718251],
       [-0.09941689, -0.10737412,  0.09639437, ...,  0.10065759,
        -0.01216662,  0.03236063],
       [ 0.45638677,  0.29556228,  0.11635725, ..., -0.04010518,
        -0.06973813,  0.0671992 ]])