# 추천 시스템 (Recommendation System)

1. 콘텐츠 기반 필터링 (Content-based Filtering)
    - 아이템의 속성을 기반으로 사용자에게 적합한 아이템 추천

2. 협업 필터링 (Collaboration Filtering)
    - 사용자들 간의 유사성을 기반으로 추천
    - 사용자 기반과 아이템 기반으로 각각 추천할 수 있음

3. 하이브리드 추천 시스템 (Hybrid Recommendation Systems)
    - 협업 필터링과 콘텐츠 기반 필터링을 결합하여 추천

### 콘텐츠 기반 필터링 (Content-based Filtering)

In [1]:
import numpy as np
import pandas as pd

In [2]:
movie_df = pd.read_csv('./data/tmdb_5000_movies.csv')
movie_df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


In [3]:
movie_df = movie_df[['id', 'title', 'genres', 'vote_average', 'vote_count', 'popularity', 'keywords', 'overview']]

In [4]:
movie_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            4803 non-null   int64  
 1   title         4803 non-null   object 
 2   genres        4803 non-null   object 
 3   vote_average  4803 non-null   float64
 4   vote_count    4803 non-null   int64  
 5   popularity    4803 non-null   float64
 6   keywords      4803 non-null   object 
 7   overview      4800 non-null   object 
dtypes: float64(2), int64(2), object(4)
memory usage: 300.3+ KB


In [5]:
from ast import literal_eval

movie_df['genres'] = movie_df['genres'].apply(literal_eval)

In [6]:
movie_df['genres'] = movie_df['genres'].apply(lambda genres: [genre['name'] for genre in genres])

In [7]:
movie_df

Unnamed: 0,id,title,genres,vote_average,vote_count,popularity,keywords,overview
0,19995,Avatar,"[Action, Adventure, Fantasy, Science Fiction]",7.2,11800,150.437577,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"[Adventure, Fantasy, Action]",6.9,4500,139.082615,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,"[Action, Adventure, Crime]",6.3,4466,107.376788,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,"[Action, Crime, Drama, Thriller]",7.6,9106,112.312950,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...
4,49529,John Carter,"[Action, Adventure, Science Fiction]",6.1,2124,43.926995,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca..."
...,...,...,...,...,...,...,...,...
4798,9367,El Mariachi,"[Action, Crime, Thriller]",6.6,238,14.269792,"[{""id"": 5616, ""name"": ""united states\u2013mexi...",El Mariachi just wants to play his guitar and ...
4799,72766,Newlyweds,"[Comedy, Romance]",5.9,5,0.642552,[],A newlywed couple's honeymoon is upended by th...
4800,231617,"Signed, Sealed, Delivered","[Comedy, Drama, Romance, TV Movie]",7.0,6,1.444476,"[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam...","""Signed, Sealed, Delivered"" introduces a dedic..."
4801,126186,Shanghai Calling,[],5.7,7,0.857008,[],When ambitious New York attorney Sam is sent t...


In [8]:
movie_df['genres'] = movie_df['genres'].apply(lambda x: ' '.join(x))
movie_df['genres']

0       Action Adventure Fantasy Science Fiction
1                       Adventure Fantasy Action
2                         Action Adventure Crime
3                    Action Crime Drama Thriller
4               Action Adventure Science Fiction
                          ...                   
4798                       Action Crime Thriller
4799                              Comedy Romance
4800               Comedy Drama Romance TV Movie
4801                                            
4802                                 Documentary
Name: genres, Length: 4803, dtype: object

In [9]:
str_list = '[1, 2, 3]'
lst = literal_eval(str_list)
print(type(lst))

<class 'list'>


In [10]:
from sklearn.feature_extraction.text import CountVectorizer

count_vectorizer = CountVectorizer(ngram_range=(1, 2))
genres_vec = count_vectorizer.fit_transform(movie_df['genres'])
print(genres_vec.shape)
print(genres_vec.toarray()[:5])
genres_vec_vocab = pd.DataFrame(count_vectorizer.get_feature_names_out())
genres_vec_vocab

(4803, 276)
[[1 1 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [1 1 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [1 1 0 ... 0 0 0]]


Unnamed: 0,0
0,action
1,action adventure
2,action animation
3,action comedy
4,action crime
...,...
271,western drama
272,western history
273,western music
274,western romance


### 코사인 유사도 특성

In [11]:
from sklearn.metrics.pairwise import cosine_similarity

genres_sim = cosine_similarity(genres_vec, genres_vec)
genres_sim.shape
genres_sim[:2]

array([[1.        , 0.59628479, 0.4472136 , ..., 0.        , 0.        ,
        0.        ],
       [0.59628479, 1.        , 0.4       , ..., 0.        , 0.        ,
        0.        ]], shape=(2, 4803))

In [12]:
movie_idx_by_genres_sim = genres_sim.argsort(axis=1)[:, ::-1]

In [13]:
def recommend_movie_by_genres(movie_title, top_n=10):

    movie = movie_df[movie_df['title'] == movie_title]
    if movie.empty:
        return '없는 영화입니다.'
    
    movie_idx = movie.index    # list

    topn_movie_idx = movie_idx_by_genres_sim[movie_idx, :top_n]
    topn_movie_idx = topn_movie_idx.reshape(-1)
    return movie_df.iloc[topn_movie_idx]

In [14]:
recommend_movie_by_genres('Avatar', top_n=30)

Unnamed: 0,id,title,genres,vote_average,vote_count,popularity,keywords,overview
0,19995,Avatar,Action Adventure Fantasy Science Fiction,7.2,11800,150.437577,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di..."
46,127585,X-Men: Days of Future Past,Action Adventure Fantasy Science Fiction,7.5,6032,118.078691,"[{""id"": 1228, ""name"": ""1970s""}, {""id"": 1852, ""...",The ultimate X-Men ensemble fights a war for t...
3494,27549,Beastmaster 2: Through the Portal of Time,Action Adventure Fantasy Science Fiction,4.6,17,1.478505,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","Mark Singer returns as Dar, the warrior who ca..."
870,8536,Superman II,Action Adventure Fantasy Science Fiction,6.5,629,30.515175,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Three escaped criminals from the planet Krypto...
14,49521,Man of Steel,Action Adventure Fantasy Science Fiction,6.5,6359,99.398009,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",A young boy learns that he has extraordinary p...
813,1924,Superman,Action Adventure Fantasy Science Fiction,6.9,1022,48.507081,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Mild-mannered Clark Kent works as a reporter a...
1652,14164,Dragonball Evolution,Action Adventure Fantasy Science Fiction Thriller,2.9,462,21.677732,"[{""id"": 3436, ""name"": ""karate""}, {""id"": 9715, ...",The young warrior Son Goku sets out on a quest...
1296,9531,Superman III,Comedy Action Adventure Fantasy Science Fiction,5.3,490,22.164202,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...","Aiming to defeat the Man of Steel, wealthy exe..."
420,11253,Hellboy II: The Golden Army,Adventure Fantasy Science Fiction,6.5,1527,58.57976,"[{""id"": 2096, ""name"": ""auction""}, {""id"": 7005,...",In this continuation to the adventure of the d...
419,8247,Jumper,Adventure Fantasy Science Fiction,5.9,1799,21.218,"[{""id"": 704, ""name"": ""adolescence""}, {""id"": 81...","David Rice is a man who knows no boundaries, a..."


### 평점을 반영한 추천 시스템

In [15]:
movie_df[['title', 'vote_average', 'vote_count']].sort_values('vote_average', ascending=False).head()

Unnamed: 0,title,vote_average,vote_count
4662,Little Big Top,10.0,1
3519,Stiff Upper Lips,10.0,1
4045,"Dancer, Texas Pop. 81",10.0,1
4247,Me You and Five Bucks,10.0,2
3992,Sardaarji,9.5,2


In [16]:
movie_df['vote_count'].quantile(0.6)

np.float64(370.1999999999998)

In [17]:
# 가중 평점 계산을 위한 변수 (v, m, R, C)
m = movie_df['vote_count'].quantile(0.6)    # 최소 투표 횟수
C = movie_df['vote_average'].mean()         # 전체 영화 평점 평균

def weighted_rating(movie):
    v = movie['vote_count']
    R = movie['vote_average']
    return ((v / (v + m)) * R) + ((m / (v + m)) * C)

movie_df['weighted_vote'] = movie_df.apply(weighted_rating, axis=1)

movie_df[['title', 'vote_average', 'vote_count']].sort_values('weighted_vote', ascending=False).head()

In [19]:
def recommend_movie_by_genres(movie_title, top_n=10):

    movie = movie_df[movie_df['title'] == movie_title]
    if movie.empty:
        return '없는 영화입니다.'
    
    movie_idx = movie.index    # list

    movie_df['genres_sim'] = genres_sim[movie_idx].reshape(-1)

    topn_movie_idx = movie_idx_by_genres_sim[movie_idx, :(top_n * 2)]
    topn_movie_idx = topn_movie_idx.reshape(-1)

    topn_movie_idx = topn_movie_idx[topn_movie_idx != movie_idx[0]]

    return movie_df.iloc[topn_movie_idx].sort_values('weighted_vote', ascending=False)[:top_n]

In [21]:
recommend_movie_by_genres('Avatar')

Unnamed: 0,id,title,genres,vote_average,vote_count,popularity,keywords,overview,weighted_vote,genres_sim
46,127585,X-Men: Days of Future Past,Action Adventure Fantasy Science Fiction,7.5,6032,118.078691,"[{""id"": 1228, ""name"": ""1970s""}, {""id"": 1852, ""...",The ultimate X-Men ensemble fights a war for t...,7.418594,1.0
813,1924,Superman,Action Adventure Fantasy Science Fiction,6.9,1022,48.507081,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Mild-mannered Clark Kent works as a reporter a...,6.68519,1.0
14,49521,Man of Steel,Action Adventure Fantasy Science Fiction,6.5,6359,99.398009,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",A young boy learns that he has extraordinary p...,6.477564,1.0
420,11253,Hellboy II: The Golden Army,Adventure Fantasy Science Fiction,6.5,1527,58.57976,"[{""id"": 2096, ""name"": ""auction""}, {""id"": 7005,...",In this continuation to the adventure of the d...,6.420421,0.881917
870,8536,Superman II,Action Adventure Fantasy Science Fiction,6.5,629,30.515175,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Three escaped criminals from the planet Krypto...,6.348901,1.0
232,76170,The Wolverine,Action Science Fiction Adventure Fantasy,6.3,4053,15.953444,"[{""id"": 233, ""name"": ""japan""}, {""id"": 1462, ""n...",Wolverine faces his ultimate nemesis - and tes...,6.282606,0.777778
3208,333355,Star Wars: Clone Wars: Volume 1,Action Adventure Animation Fantasy Science Fic...,8.0,27,1.881466,"[{""id"": 6091, ""name"": ""war""}, {""id"": 161176, ""...","The Saga continues with the Emmy-winning ""Star...",6.221858,0.80403
1191,11551,Small Soldiers,Comedy Adventure Fantasy Science Fiction Action,6.2,511,23.088571,"[{""id"": 3599, ""name"": ""defense industry""}, {""i...",When missile technology is used to enhance toy...,6.1547,0.80403
1932,24264,Sheena,Action Adventure Comedy Fantasy Science Fiction,5.0,22,4.020194,"[{""id"": 409, ""name"": ""africa""}, {""id"": 3070, ""...",Sheena's white parents are killed while on Saf...,6.030907,0.80403
3494,27549,Beastmaster 2: Through the Portal of Time,Action Adventure Fantasy Science Fiction,4.6,17,1.478505,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","Mark Singer returns as Dar, the warrior who ca...",6.026658,1.0
