# 추천 시스템 (Recommendation System)

1. 콘텐츠 기반 필터링 (Content-based Filtering)
    - 아이템의 속성을 기반으로 사용자에게 적합한 아이템 추천

2. 협업 필터링 (Collaborative Filtering)
    - 사용자들 간의 유사성을 기반으로 추천
    - 사용자 기반과 아이템 기반으로 각각 추천할 수 있음

3. 하이브리드 추천 시스템 (Hybrid Recommendation Systems)
    - 협업 필터링과 콘텐츠 기반 필터링을 결합하여 추천

### 콘텐츠 기반 필터링 (Content-based Filtering)

- 영화 데이터
    1. **id**: 영화의 고유 ID를 나타냄.
    2. **title**: 영화의 제목.
    3. **budget**: 영화 제작에 소요된 예산 (단위: USD).
    4. **popularity**: 영화의 인기 점수. TMDb에서 제공하는 영화의 인기도를 나타냄.
    5. **genres**: 영화의 장르를 나타내며, 여러 장르가 포함된 경우 리스트로 표현됨.
    6. **overview**: 영화의 줄거리나 개요를 설명하는 텍스트.
    7. **release_date**: 영화의 개봉 날짜.
    8. **revenue**: 영화의 총 수익 (단위: USD).
    9. **runtime**: 영화의 상영 시간 (단위: 분).
    10. **vote_average**: TMDb에서 제공하는 영화의 평균 평점.
    11. **vote_count**: 영화에 대한 평가 개수.
    12. **production_companies**: 영화의 제작 회사 리스트.
    13. **production_countries**: 영화의 제작 국가 리스트.
    14. **spoken_languages**: 영화에서 사용된 언어 리스트.
    15. **cast**: 주요 출연진 리스트.
    16. **crew**: 영화 제작에 참여한 주요 제작진 리스트.
    17. **keywords**: 영화의 키워드 리스트.
    18. **tagline**: 영화의 태그라인(주요 홍보 문구).
    19. **original_language**: 영화의 원어 (예: 영어, 한국어 등).
    20. **homepage**: 영화의 공식 웹사이트 URL.
    21. **poster_path**: 영화 포스터 이미지 URL 경로.

In [31]:
import numpy as np
import pandas as pd

In [32]:
# 데이터 로드
movie_df = pd.read_csv('data/tmdb_5000_movies.csv')
display(movie_df.head())
print(movie_df.shape)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


(4803, 20)


In [33]:
# 사용할 컬럼 선택
movie_df = movie_df[['id', 'title', 'genres', 'vote_average', 'vote_count', 'popularity', 'keywords', 'overview']]
movie_df.info()
movie_df.head()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 8 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   id            4803 non-null   int64  
 1   title         4803 non-null   object 
 2   genres        4803 non-null   object 
 3   vote_average  4803 non-null   float64
 4   vote_count    4803 non-null   int64  
 5   popularity    4803 non-null   float64
 6   keywords      4803 non-null   object 
 7   overview      4800 non-null   object 
dtypes: float64(2), int64(2), object(4)
memory usage: 300.3+ KB


Unnamed: 0,id,title,genres,vote_average,vote_count,popularity,keywords,overview
0,19995,Avatar,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",7.2,11800,150.437577,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di..."
1,285,Pirates of the Caribbean: At World's End,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",6.9,4500,139.082615,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha..."
2,206647,Spectre,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",6.3,4466,107.376788,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...
3,49026,The Dark Knight Rises,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",7.6,9106,112.31295,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...
4,49529,John Carter,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",6.1,2124,43.926995,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca..."


In [34]:
# 장르 데이터 전처리
from ast import literal_eval

# str -> list(dict)
movie_df['genres'] = movie_df['genres'].apply(literal_eval)

# name value만 꺼내서 list
movie_df['genres'] = movie_df['genres'].apply(lambda genres: [genre['name'] for genre in genres])
movie_df['genres']

# list -> str (공백 기준 구분 문자열)
movie_df['genres'] = movie_df['genres'].apply(lambda x: ' '.join(x))
movie_df['genres']

0       Action Adventure Fantasy Science Fiction
1                       Adventure Fantasy Action
2                         Action Adventure Crime
3                    Action Crime Drama Thriller
4               Action Adventure Science Fiction
                          ...                   
4798                       Action Crime Thriller
4799                              Comedy Romance
4800               Comedy Drama Romance TV Movie
4801                                            
4802                                 Documentary
Name: genres, Length: 4803, dtype: object

In [35]:
# 장르 유사도 측정을 위한 CountVectorizer 사용
from sklearn.feature_extraction.text import CountVectorizer

count_vectorizer = CountVectorizer(ngram_range=(1, 2))
genres_vec = count_vectorizer.fit_transform(movie_df['genres'])
print(genres_vec.shape)
print(genres_vec.toarray()[:5])

genres_vec_vocab = pd.DataFrame(count_vectorizer.get_feature_names_out())
print(genres_vec_vocab)

(4803, 276)
[[1 1 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [1 1 0 ... 0 0 0]
 [1 0 0 ... 0 0 0]
 [1 1 0 ... 0 0 0]]
                    0
0              action
1    action adventure
2    action animation
3       action comedy
4        action crime
..                ...
271     western drama
272   western history
273     western music
274   western romance
275  western thriller

[276 rows x 1 columns]


##### 코사인 유사도 측정

In [36]:
from sklearn.metrics.pairwise import cosine_similarity

genres_sim = cosine_similarity(genres_vec, genres_vec)
genres_sim.shape
# genres_sim[:2]

(4803, 4803)

In [37]:
movie_idx_by_genres_sim = genres_sim.argsort(axis=1)[:, ::-1]
# movie_idx_by_genres_sim[:2]

In [38]:
def recommend_movie_by_genres(movie_title, top_n=10):

    movie = movie_df[movie_df['title'] == movie_title]
    if movie.empty:
        return '없는 영화입니다.'
    
    movie_idx = movie.index    # list

    topn_movie_idx = movie_idx_by_genres_sim[movie_idx, :top_n]
    topn_movie_idx = topn_movie_idx.reshape(-1)
    return movie_df.iloc[topn_movie_idx]

In [41]:
recommend_movie_by_genres('Avatar', top_n=30)

Unnamed: 0,id,title,genres,vote_average,vote_count,popularity,keywords,overview
0,19995,Avatar,Action Adventure Fantasy Science Fiction,7.2,11800,150.437577,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di..."
46,127585,X-Men: Days of Future Past,Action Adventure Fantasy Science Fiction,7.5,6032,118.078691,"[{""id"": 1228, ""name"": ""1970s""}, {""id"": 1852, ""...",The ultimate X-Men ensemble fights a war for t...
3494,27549,Beastmaster 2: Through the Portal of Time,Action Adventure Fantasy Science Fiction,4.6,17,1.478505,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","Mark Singer returns as Dar, the warrior who ca..."
870,8536,Superman II,Action Adventure Fantasy Science Fiction,6.5,629,30.515175,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Three escaped criminals from the planet Krypto...
14,49521,Man of Steel,Action Adventure Fantasy Science Fiction,6.5,6359,99.398009,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",A young boy learns that he has extraordinary p...
813,1924,Superman,Action Adventure Fantasy Science Fiction,6.9,1022,48.507081,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...",Mild-mannered Clark Kent works as a reporter a...
1652,14164,Dragonball Evolution,Action Adventure Fantasy Science Fiction Thriller,2.9,462,21.677732,"[{""id"": 3436, ""name"": ""karate""}, {""id"": 9715, ...",The young warrior Son Goku sets out on a quest...
1296,9531,Superman III,Comedy Action Adventure Fantasy Science Fiction,5.3,490,22.164202,"[{""id"": 83, ""name"": ""saving the world""}, {""id""...","Aiming to defeat the Man of Steel, wealthy exe..."
420,11253,Hellboy II: The Golden Army,Adventure Fantasy Science Fiction,6.5,1527,58.57976,"[{""id"": 2096, ""name"": ""auction""}, {""id"": 7005,...",In this continuation to the adventure of the d...
419,8247,Jumper,Adventure Fantasy Science Fiction,5.9,1799,21.218,"[{""id"": 704, ""name"": ""adolescence""}, {""id"": 81...","David Rice is a man who knows no boundaries, a..."
