# 아이템 기반 협업 필터링 (Item-based Collaborative Filtering)

- 특정 아이템과 유사한 다른 아이템을 찾아 추천하는 방식
- 사용자의 과거 행동 데이터를 바탕으로 각 아이템 간의 유사도를 계산하고, 이를 기반으로 추천 생성

**과정**
1. 아이템 간 유사도 계산
2. 사용자의 선호도 파악
3. 가중 평점 예측
4. 추천 제공

**장점**
- 사용자 수가 많아지더라도 유사도 계산에 소요되는 시간 비교적 적음
- 아이템의 특성을 고려하지 않으므로 특성 데이터가 부족하더라도 활용 가능

**단점**
- 아이템 간 유사도만 고여하므로 사용자의 선호 변화나 개인 취향 반영이 어려울 수 있음
- 충분한 기반 데이터가 없는 경우 정확한 유사도 계산이 어려움 (Cold Start)

In [14]:
import numpy as np
import pandas as pd

In [15]:
movies_df = pd.read_csv('./data/ml-latest-small/movies.csv')
ratings_df = pd.read_csv('./data/ml-latest-small/ratings.csv')

In [16]:
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [17]:
movies_ratings_df = pd.merge(ratings_df, movies_df, on='movieId', how='inner')
print(movies_ratings_df.shape)
movies_ratings_df.head()

(100836, 6)


Unnamed: 0,userId,movieId,rating,timestamp,title,genres
0,1,1,4.0,964982703,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,1,3,4.0,964981247,Grumpier Old Men (1995),Comedy|Romance
2,1,6,4.0,964982224,Heat (1995),Action|Crime|Thriller
3,1,47,5.0,964983815,Seven (a.k.a. Se7en) (1995),Mystery|Thriller
4,1,50,5.0,964982931,"Usual Suspects, The (1995)",Crime|Mystery|Thriller


##### 사용자 평점 기반 아이템(영화)유사도 계산

In [18]:
users_movies_df = movies_ratings_df.pivot_table('rating', index='userId', columns='title', fill_value=0)
print(users_movies_df.shape)
users_movies_df.head()

(610, 9719)


title,'71 (2014),'Hellboy': The Seeds of Creation (2004),'Round Midnight (1986),'Salem's Lot (2004),'Til There Was You (1997),'Tis the Season for Love (2015),"'burbs, The (1989)",'night Mother (1986),(500) Days of Summer (2009),*batteries not included (1987),...,Zulu (2013),[REC] (2007),[REC]² (2009),[REC]³ 3 Génesis (2012),anohana: The Flower We Saw That Day - The Movie (2013),eXistenZ (1999),xXx (2002),xXx: State of the Union (2005),¡Three Amigos! (1986),À nous la liberté (Freedom for Us) (1931)
userId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0,0.0
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
# 특정 사용자의 영화 평점 조회
users_movies_df.iloc[555].sort_values(ascending=False)[:30]

title
Aladdin (1992)                                                                                    5.0
How to Train Your Dragon (2010)                                                                   5.0
Harry Potter and the Sorcerer's Stone (a.k.a. Harry Potter and the Philosopher's Stone) (2001)    5.0
Guardians of the Galaxy (2014)                                                                    5.0
Lord of the Rings: The Fellowship of the Ring, The (2001)                                         4.5
Eragon (2006)                                                                                     4.5
Harry Potter and the Deathly Hallows: Part 2 (2011)                                               4.5
Harry Potter and the Chamber of Secrets (2002)                                                    4.5
Underworld (2003)                                                                                 4.0
Into the Woods (2014)                                                       

In [20]:
# 사용자별 평점 개수
(users_movies_df != 0).sum(axis=1).describe()

count     610.000000
mean      165.298361
std       269.466692
min        20.000000
25%        35.000000
50%        70.500000
75%       168.000000
max      2698.000000
dtype: float64

In [23]:
from sklearn.metrics.pairwise import cosine_similarity

movies_sim = cosine_similarity(users_movies_df.T, users_movies_df.T)
movies_sim_df = pd.DataFrame(movies_sim, index=users_movies_df.columns, columns=users_movies_df.columns)

In [25]:
movies_sim_df["'Hellboy': The Seeds of Creation (2004)"].sort_values(ascending=False)[:10]

title
'Hellboy': The Seeds of Creation (2004)                       1.000000
Monsters (2010)                                               1.000000
Space Battleship Yamato (2010)                                1.000000
All the Right Moves (1983)                                    0.780869
Hidden Fortress, The (Kakushi-toride no san-akunin) (1958)    0.747409
...And Justice for All (1979)                                 0.715542
'Round Midnight (1986)                                        0.707107
Kagemusha (1980)                                              0.542720
Sanjuro (Tsubaki Sanjûrô) (1962)                              0.526685
Ghost Rider: Spirit of Vengeance (2012)                       0.525226
Name: 'Hellboy': The Seeds of Creation (2004), dtype: float64

# 가중 평점 예측

- 전체 가중평점 예측

In [28]:
def predict_ratings(users_movies_df, movies_sim_df):
    return users_movies_df.dot(movies_sim_df) / np.abs(movies_sim_df).sum(axis=1)

ratings_pred_df = predict_ratings(users_movies_df, movies_sim_df)
print(ratings_pred_df.shape)
ratings_pred_df.head(1).T

(610, 9719)


userId,1
title,Unnamed: 1_level_1
'71 (2014),0.070345
'Hellboy': The Seeds of Creation (2004),0.577855
'Round Midnight (1986),0.321696
'Salem's Lot (2004),0.227055
'Til There Was You (1997),0.206958
...,...
eXistenZ (1999),0.212070
xXx (2002),0.192921
xXx: State of the Union (2005),0.136024
¡Three Amigos! (1986),0.292955


In [None]:
from sklearn.metrics import mean_squared_error

# 실제 평점과 예측 평점 오차 비교
def get_mse(actual, pred):
    non_zero_idx = actual.nonzero()
    # print(non_zero_idx)   # ([row_idx, row_idx, ...], [col_idx, col_idx, ...])
    actual = actual[non_zero_idx]
    pred = pred[non_zero_idx]
    return mean_squared_error(actual, pred)

get_mse(users_movies_df.values, ratings_pred_df.values)


9.895354759094706

- 특정 사용자의 영화 하나 평점 예측

In [32]:
users_movies_df.iloc[176, 35]   # 176번째 사용자의 35번째 영화에 대한 평점

np.float64(5.0)

In [34]:
topn_sim_idx = movies_sim_df.iloc[35].argsort()[::-1]
topn_sim_idx = topn_sim_idx[:20]
topn_sim_idx

title
À nous la liberté (Freedom for Us) (1931)                 4412
¡Three Amigos! (1986)                                     1098
xXx: State of the Union (2005)                            6814
xXx (2002)                                                9491
eXistenZ (1999)                                           9169
anohana: The Flower We Saw That Day - The Movie (2013)    8416
[REC]³ 3 Génesis (2012)                                     35
[REC]² (2009)                                             5855
[REC] (2007)                                              4426
Zulu (2013)                                               3522
Zulu (1964)                                               3523
Zootopia (2016)                                           5658
Zoom (2015)                                               4129
Zoom (2006)                                               8656
Zoolander 2 (2016)                                        8495
Zoolander (2001)                                 

In [35]:
users_movies_df.iloc[176, topn_sim_idx]

title
Intolerance: Love's Struggle Throughout the Ages (1916)    3.5
Birth of a Nation, The (1915)                              2.0
Princess and the Pirate, The (1944)                        4.0
Winchester '73 (1950)                                      4.0
Very Potter Sequel, A (2010)                               5.0
The Blue Lagoon (1949)                                     3.0
12 Angry Men (1997)                                        5.0
Mr. Skeffington (1944)                                     5.0
Invisible Man Returns, The (1940)                          2.5
Gold Diggers of 1933 (1933)                                3.5
Gold Diggers of 1935 (1935)                                3.5
Mildred Pierce (2011)                                      4.5
Hunchback of Notre Dame, The (1923)                        3.0
Thief of Bagdad, The (1924)                                3.5
The Great Train Robbery (1903)                             4.0
Snake Pit, The (1948)                            