> 데이터 출처: https://www.kaggle.com/code/ibtesama/getting-started-with-a-movie-recommendation-system/input?select=ratings_small.csv  

## 3. Collaborative Filtering (협업 필터링: 사용자 리뷰 기반)

In [1]:
!pip install scikit-surprise

Defaulting to user installation because normal site-packages is not writeable
Collecting scikit-surprise
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
     -------------------------------------- 772.0/772.0 kB 8.1 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: scikit-surprise
  Building wheel for scikit-surprise (setup.py): started
  Building wheel for scikit-surprise (setup.py): finished with status 'done'
  Created wheel for scikit-surprise: filename=scikit_surprise-1.1.3-cp39-cp39-win_amd64.whl size=1086279 sha256=b5da7577dc97b63d4f0eea3c89599747be3c8369941d5442c887840795726920
  Stored in directory: c:\users\jktak\appdata\local\pip\cache\wheels\c6\3a\46\9b17b3512bdf283c6cb84f59929cdd5199d4e754d596d22784
Successfully built scikit-surprise
Installing collected packages: scikit-surprise
Successfully installed scikit-surprise-1.1.3




In [2]:
import surprise
surprise.__version__

'1.1.3'

In [3]:
import pandas as pd
from surprise import Reader, Dataset, SVD
from surprise.model_selection import cross_validate

In [4]:
ratings = pd.read_csv('ratings_small.csv')

In [5]:
ratings.head()

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205


In [6]:
ratings['rating'].min()

0.5

In [7]:
ratings['rating'].max()

5.0

In [8]:
reader = Reader(rating_scale=(0.5, 5))

In [9]:
data = Dataset.load_from_df(ratings[['userId', 'movieId', 'rating']], reader=reader)

In [10]:
data

<surprise.dataset.DatasetAutoFolds at 0x14e8081dee0>

In [11]:
svd = SVD(random_state=0)

In [12]:
# K-fold 교차검증 (k=5)
cross_validate(svd, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    0.8949  0.9004  0.8964  0.8978  0.8962  0.8971  0.0019  
MAE (testset)     0.6878  0.6946  0.6908  0.6935  0.6864  0.6906  0.0032  
Fit time          0.81    0.78    0.75    0.76    0.76    0.77    0.02    
Test time         0.09    0.10    0.14    0.09    0.09    0.10    0.02    


{'test_rmse': array([0.89485598, 0.90037197, 0.89643346, 0.89776579, 0.8961604 ]),
 'test_mae': array([0.68776585, 0.69460047, 0.69082614, 0.69349923, 0.68641682]),
 'fit_time': (0.8109123706817627,
  0.7811427116394043,
  0.7507283687591553,
  0.7570011615753174,
  0.7610712051391602),
 'test_time': (0.09384870529174805,
  0.09600043296813965,
  0.14299798011779785,
  0.09299921989440918,
  0.09099841117858887)}

In [13]:
trainset = data.build_full_trainset()
svd.fit(trainset)

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x14e80820d60>

In [14]:
ratings[ratings['userId'] == 1]

Unnamed: 0,userId,movieId,rating,timestamp
0,1,31,2.5,1260759144
1,1,1029,3.0,1260759179
2,1,1061,3.0,1260759182
3,1,1129,2.0,1260759185
4,1,1172,4.0,1260759205
5,1,1263,2.0,1260759151
6,1,1287,2.0,1260759187
7,1,1293,2.0,1260759148
8,1,1339,3.5,1260759125
9,1,1343,2.0,1260759131


In [15]:
svd.predict(1, 302)

Prediction(uid=1, iid=302, r_ui=None, est=2.7142061734434044, details={'was_impossible': False})

In [16]:
svd.predict(1, 1029, 3) # userId 1인 사람이 movieId 1029인 영화에 대해서 실제 평가 3점일 때의 예측 평가 점수

Prediction(uid=1, iid=1029, r_ui=3, est=2.8814455446761933, details={'was_impossible': False})

In [17]:
ratings[ratings['userId'] == 100]

Unnamed: 0,userId,movieId,rating,timestamp
15273,100,1,4.0,854193977
15274,100,3,4.0,854194024
15275,100,6,3.0,854194023
15276,100,7,3.0,854194024
15277,100,25,4.0,854193977
15278,100,32,5.0,854193977
15279,100,52,3.0,854194056
15280,100,62,3.0,854193977
15281,100,86,3.0,854194208
15282,100,88,2.0,854194208


In [18]:
svd.predict(100, 1029)

Prediction(uid=100, iid=1029, r_ui=None, est=3.7705476478414846, details={'was_impossible': False})