## Collaborative Filtering

### User based filtering

> - 유사한 사용자가 좋아하는 제품을 사용자에게 추천  
> - 사용자 간 유사성 측정을 위해 피어슨 상관계수나 코사인 유사성 이용

### Item Based Collaborative Filtering
> - 사용자가 평가한 항목 간의 유사성기반 추천  
> - 공백을 수직으로 채움

In [1]:
import pandas as pd
import numpy as np

In [2]:
df1 = pd.read_csv('../input/tmdb-movie-metadata/tmdb_5000_credits.csv')
df2 = pd.read_csv('../input/tmdb-movie-metadata/tmdb_5000_movies.csv')

In [3]:
df1.columns = ['id','tittle','cast','crew']
df2= df2.merge(df1,on='id')

### Single Value Decomposition
> 협업필터링에의해 발생하는 확장성 및 희소성 문제 해결방법 중 하나  
잠재요인 모델을 활용하여 사용자와 항목간의 유사성을 찾아냄

In [7]:
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

reader = Reader()
ratings = pd.read_csv('../input/the-movies-dataset/ratings_small.csv')
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [13]:
# Load the dataset 
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader)

# Use the famous SVD algorithm
svd = SVD()

# Run 5-fold cross-validation and then print results
cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8950  0.8960  0.8996  0.8960  0.9039  0.8981  0.0033  
MAE (testset)     0.6907  0.6899  0.6926  0.6875  0.6986  0.6919  0.0037  
Fit time          5.37    5.40    5.77    4.46    4.84    5.17    0.46    
Test time         0.21    0.23    0.16    0.16    0.15    0.18    0.03    


{'test_rmse': array([0.89495068, 0.89600937, 0.89964834, 0.89597215, 0.90387562]),
 'test_mae': array([0.69073812, 0.68992912, 0.69259032, 0.6875435 , 0.69855656]),
 'fit_time': (5.370400667190552,
  5.404547452926636,
  5.765511512756348,
  4.45536208152771,
  4.842025279998779),
 'test_time': (0.21173810958862305,
  0.23118329048156738,
  0.16089558601379395,
  0.16045761108398438,
  0.15306663513183594)}

In [14]:
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x2ca4f5da160>

In [15]:
ratings[ratings['userId'] == 1]

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
5,1,1263,2.0,1260759151
6,1,1287,2.0,1260759187
7,1,1293,2.0,1260759148
8,1,1339,3.5,1260759125
9,1,1343,2.0,1260759131


In [16]:
svd.predict(1, 302, 3)

Prediction(uid=1, iid=302, r_ui=3, est=2.5747397940089227, details={'was_impossible': False})