*Візьміть датасет movielens і побудуйте модель матричної факторизації. У даній бібліотеці він має назву SVD. Підберіть найкращі параметри за допомогою крос-валідації, також поекспериментуйте з іншими алгоритмами розрахунків (SVD++, NMF) і оберіть той, який буде оптимальним.*

In [2]:
import pandas as pd
from surprise import accuracy, Dataset, SVD, SVDpp, NMF
from surprise.model_selection import train_test_split
from surprise.model_selection import cross_validate

In [3]:
data = Dataset.load_builtin(name='ml-100k', prompt=True)

In [4]:
algorithms = [SVD(), SVDpp(), NMF()]

#### SVD Algorithm

In [5]:
SVD_result = cross_validate(algorithms[0], data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
SVD_result = pd.DataFrame.from_dict(SVD_result).mean(axis=0)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9340  0.9407  0.9293  0.9446  0.9320  0.9361  0.0057  
MAE (testset)     0.7335  0.7407  0.7364  0.7428  0.7355  0.7378  0.0034  
Fit time          0.91    0.97    0.94    0.93    0.98    0.95    0.03    
Test time         0.15    0.22    0.20    0.14    0.20    0.18    0.03    


#### SVDpp Algorithm

In [6]:
SVDpp_result = cross_validate(algorithms[1], data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
SVDpp_result = pd.DataFrame.from_dict(SVDpp_result).mean(axis=0)

Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9213  0.9165  0.9206  0.9194  0.9152  0.9186  0.0023  
MAE (testset)     0.7219  0.7200  0.7232  0.7206  0.7192  0.7210  0.0014  
Fit time          25.03   24.84   25.21   24.90   25.36   25.07   0.19    
Test time         3.27    3.36    3.27    3.17    3.24    3.26    0.06    


#### NMF Algorithm

In [7]:
NMF_result = cross_validate(algorithms[2], data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
NMF_result = pd.DataFrame.from_dict(NMF_result).mean(axis=0)

Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.9643  0.9698  0.9621  0.9586  0.9661  0.9642  0.0038  
MAE (testset)     0.7570  0.7628  0.7569  0.7523  0.7583  0.7575  0.0033  
Fit time          1.73    1.82    1.76    1.75    1.74    1.76    0.03    
Test time         0.13    0.19    0.12    0.12    0.20    0.15    0.04    


#### Surprise results

In [8]:
surprise_results = pd.DataFrame(columns=['SVD', 'SVDpp', 'NMF'])

In [9]:
surprise_results['SVD'] = SVD_result
surprise_results['SVDpp'] = SVDpp_result
surprise_results['NMF'] = NMF_result

In [10]:
surprise_results

Unnamed: 0,SVD,SVDpp,NMF
test_rmse,0.936102,0.918619,0.964172
test_mae,0.73777,0.720976,0.75748
fit_time,0.949197,25.066613,1.759419
test_time,0.183576,3.259477,0.152807


#### Get train and test data

In [11]:
train_data, test_data = train_test_split(data, test_size=0.25)

In [12]:
predictions_svd = algorithms[0].fit(train_data).test(test_data)
predictions_svdpp = algorithms[1].fit(train_data).test(test_data)
predictions_nmf = algorithms[2].fit(train_data).test(test_data)

In [13]:
print('SVD:', end=' ')
accuracy.rmse(predictions_svd)
print('SVDpp:', end=' ')
accuracy.rmse(predictions_svdpp)
print('NMF:', end=' ')
accuracy.rmse(predictions_nmf)

SVD: RMSE: 0.9330
SVDpp: RMSE: 0.9160
NMF: RMSE: 0.9609


0.9608906791962171