# 블로그 설명

**해당 자료에 대한 설명은 아래 블로그에도 올려두었습니다.**
- https://lsjsj92.tistory.com/565

----

이전 자료에서 다루었던 추천 시스템을 실습으로 살펴봅니다.

해당 자료는 아래 리스트에서 참고했습니다.

- https://www.kaggle.com/rounakbanik/movie-recommender-systems
- https://www.kaggle.com/ibtesama/getting-started-with-a-movie-recommendation-system


데이터는 kaggle의 **The movies Dataset (https://www.kaggle.com/rounakbanik/the-movies-dataset)** 을 사용했습니다.


가장 먼저 데이터 전처리와 콘텐츠 기반(content based filtering)으로 시작합니다.

# 데이터 전처리

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from ast import literal_eval
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

데이터를 불러옵니다.

In [2]:
data = pd.read_csv('./movie_data/tmdb_5000_movies.csv')

In [3]:
data.head(2)

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500


In [4]:
data.shape

(4803, 20)

데이터에 대한 설명은 참조 링크에 들어가면 자세히 쓰여져 있습니다.  
여기서 제가 일반적으로 많이 사용하는 컬럼은 아래와 같습니다.  

- genres : 영화 장르
- keywords : 영화의 키워드
- original_language : 영화 언어 
- title : 제목
- vote_average : 평점 평균
- vote_count :  평점 카운트
- popularity : 인기도
- overview : 개요 설명

등등 같은 컬럼을 사용할 예정입니다. 다른 컬럼은 일단 여기서 그렇게 중요하지 않게 사용합니다.  
사실, release_data와 같은 컬럼도 중요할 수 있습니다. 최신 영화를 추천할 수도 있으니까요. 하지만 여기서는 사용하지 않겠습니다.  


가장 먼저 **전처리**를 조금 해주어야 합니다.  
먼저 우리가 사용할 데이터부터 뽑아보죠.


In [5]:
data = data[['id','genres', 'vote_average', 'vote_count','popularity','title',  'keywords', 'overview']]


그리고 vote_average값을 변경해주어야 합니다.   
현재 vote_average는 조금 **불공정**하게 되어 있습니다.

왜냐하면, vote 수가 적은데(예를 들어 3개) 3개 전부 5점이라고 하면 vote가 5점으로 되어 있기 때문입니다.  
하지만, vote 수가 많을수록 vote_average가 떨어질 수 밖에 없습니다. 많은 사람들이 평가를 하니까요.  

그래서 이런 불공정을 처리하기 위해 imdb에서 처리한 방법이 있습니다.  
해당 이슈는 url : https://www.quora.com/How-does-IMDbs-rating-system-work 에서 확인할 수 있습니다.

그에 대한 답은 아래와 같습니다.

![1](https://user-images.githubusercontent.com/24634054/71774470-d1470c80-2fb2-11ea-8a1e-aa018dd6d25a.JPG)

- r : 개별 영화 평점
- v : 개별 영화에 평점을 투표한 횟수
- m : 250위 안에 들어야 하는 최소 투표 (정하기 나름인듯. 난 500이라고 하면 500으로 해도 되고.)
- c : 전체 영화에 대한 평균 평점

여기서 m은 **500위로 가정하고 진행하겠습니다.** 

먼저 m부터 찾아보죠. 500위 정도로 들어오게 하려면 vote_count가 상위 몇 %이어야 할까요?  
이는 quantile을 이용해서 구할 수 있습니다.

In [6]:
tmp_m = data['vote_count'].quantile(0.89)
tmp_m

1683.8999999999987

In [7]:
tmp_data = data.copy().loc[data['vote_count'] >= tmp_m]
tmp_data.shape

(529, 8)

상위 90%로 했을 때 481개가 들어옵니다.   
89%로 하면 529개가 들어오게 됩니다. 저는 90%로 가정하고 진행하도록 하겠습니다.

In [8]:
del tmp_data

m = data['vote_count'].quantile(0.9)
data = data.loc[data['vote_count'] >= m]

In [9]:
data.head()

Unnamed: 0,id,genres,vote_average,vote_count,popularity,title,keywords,overview
0,19995,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",7.2,11800,150.437577,Avatar,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di..."
1,285,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",6.9,4500,139.082615,Pirates of the Caribbean: At World's End,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha..."
2,206647,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",6.3,4466,107.376788,Spectre,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...
3,49026,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",7.6,9106,112.31295,The Dark Knight Rises,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...
4,49529,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",6.1,2124,43.926995,John Carter,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca..."


In [10]:
C = data['vote_average'].mean()

In [11]:
print(C)
print(m)

6.962993762993763
1838.4000000000015


In [12]:
def weighted_rating(x, m=m, C=C):
    v = x['vote_count']
    R = x['vote_average']
    
    return ( v / (v+m) * R ) + (m / (m + v) * C)

In [13]:
data['score'] = data.apply(weighted_rating, axis = 1)

In [14]:
data.head(5)

Unnamed: 0,id,genres,vote_average,vote_count,popularity,title,keywords,overview,score
0,19995,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",7.2,11800,150.437577,Avatar,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...",7.168053
1,285,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",6.9,4500,139.082615,Pirates of the Caribbean: At World's End,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...",6.918271
2,206647,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",6.3,4466,107.376788,Spectre,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...,6.493333
3,49026,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",7.6,9106,112.31295,The Dark Knight Rises,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...,7.492998
4,49529,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",6.1,2124,43.926995,John Carter,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca...",6.500396


In [15]:
data.shape

(481, 9)

이렇게 weighted_score가 완성되었습니다.

또한, 지금 장르와 키워드를 보시면 조금 독특한 구조의 데이터를 가지고 있습니다.

In [16]:
data[['genres', 'keywords']].head(2)

Unnamed: 0,genres,keywords
0,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","[{""id"": 1463, ""name"": ""culture clash""}, {""id"":..."
1,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na..."


list 내부에 dictionary가 있는 구조로 되어있습니다.  
이렇게 표현한 이유는 하나의 영화가 하나의 장르에만 속하지 않고, 하나의 키워드만 있지 않기 때문입니다.  
그리고 문제가 지금 내부에는 **문자열**로 들어가 있는 것입니다.

이를 해결하기 위해서 ast 패키지를 사용해야합니다. ast내부에 literal_eval을 사용해보죠  

그러면 list와 dictionary 형태로 바뀌게 됩니다.

In [17]:
data['genres'] = data['genres'].apply(literal_eval)
data['keywords'] = data['keywords'].apply(literal_eval)

In [18]:
data[['genres', 'keywords']].head(2)

Unnamed: 0,genres,keywords
0,"[{'id': 28, 'name': 'Action'}, {'id': 12, 'nam...","[{'id': 1463, 'name': 'culture clash'}, {'id':..."
1,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...","[{'id': 270, 'name': 'ocean'}, {'id': 726, 'na..."


그럼 이제 장르와 키워드를 id를 제거한 후 name만 뽑아내면 끝나겠죠?  
우리에게 저 id 값을 필요없으니까요. 장르가 무엇이고 키워드가 무엇인지만 알면됩니다.

In [19]:
data['genres'] = data['genres'].apply(lambda x : [d['name'] for d in x]).apply(lambda x : " ".join(x))
data['keywords'] = data['keywords'].apply(lambda x : [d['name'] for d in x]).apply(lambda x : " ".join(x))

In [20]:
data.head(2)

Unnamed: 0,id,genres,vote_average,vote_count,popularity,title,keywords,overview,score
0,19995,Action Adventure Fantasy Science Fiction,7.2,11800,150.437577,Avatar,culture clash future space war space colony so...,"In the 22nd century, a paraplegic Marine is di...",7.168053
1,285,Adventure Fantasy Action,6.9,4500,139.082615,Pirates of the Caribbean: At World's End,ocean drug abuse exotic island east india trad...,"Captain Barbossa, long believed to be dead, ha...",6.918271


다음장에서 사용할 데이터이므로 미리 저장을 합니다.

In [21]:
data.to_csv('./movie_data/pre_tmdb_5000_movies.csv', index = False)

자! 이렇게 어느정도 전처리가 마무리 되었습니다.

이제 본격적으로 진행해보죠.

# 콘텐츠 기반 필터링 추천(Content based filtering)

콘텐츠 기반으로 추천을 하고자 합니다. 콘텐츠 기반 필터링을 이용해서 추천을 진행하는 것은 비슷한 콘텐츠를 사용자에게 추천하는 것을 말합니다.  

여기서 비슷한 콘텐츠는 무엇일까요? 대표적으로 '장르'가 될 수 있습니다.   
따라서 content based filtering 추천에서는 이 '장르'를 이용해서 추천을 하겠습니다.  

현재 장르는 문자열로 되어 있습니다. 이 문자열을 숫자로 바꾸어 벡터화 시켜야겠죠? 이것부터 진행하죠

## 해당 작업은 2021.08.01에 수정되었습니다.

- 데이터 처리를 잘못한 것을 발견해서 급하게 수정하였습니다.
- 위에서 전처리한 데이터는 참고만 부탁드립니다!
- 위에서 처리한 방법은 나중에 movielens 데이터 전처리 할 때 필요한 요소일 수 있으니 삭제하지 않고 유지하겠습니다.


데이터는 아래와 같은 것을 사용합니다.
- movie_metadata.csv
    - 제목과 장르 등의 영화 메타 데이터
- keywords.csv
    - 영화 id에 따라 keyword 값

In [2]:
movie_data = pd.read_csv('./data/movie_dataset/movies_metadata.csv')
movie_data =  movie_data.loc[movie_data['original_language'] == 'en', :]
movie_data = movie_data[['id', 'title', 'original_language', 'genres']]

print(movie_data.shape)
movie_data.head()

(32269, 4)


  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,


Unnamed: 0,id,title,original_language,genres
0,862,Toy Story,en,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '..."
1,8844,Jumanji,en,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '..."
2,15602,Grumpier Old Men,en,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ..."
3,31357,Waiting to Exhale,en,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam..."
4,11862,Father of the Bride Part II,en,"[{'id': 35, 'name': 'Comedy'}]"


In [3]:
movie_keyword = pd.read_csv('./data/movie_dataset/keywords.csv')
print(movie_keyword.shape)
movie_keyword.head()

(46419, 2)


Unnamed: 0,id,keywords
0,862,"[{'id': 931, 'name': 'jealousy'}, {'id': 4290,..."
1,8844,"[{'id': 10090, 'name': 'board game'}, {'id': 1..."
2,15602,"[{'id': 1495, 'name': 'fishing'}, {'id': 12392..."
3,31357,"[{'id': 818, 'name': 'based on novel'}, {'id':..."
4,11862,"[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n..."


데이터 2개를 가져와서 id를 맞춰 merge 해줍니다.

In [4]:
movie_data.id = movie_data.id.astype(int)
movie_keyword.id = movie_keyword.id.astype(int)
movie_data = pd.merge(movie_data, movie_keyword, on='id')
print(movie_data.shape)
movie_data.head()

(32852, 5)


Unnamed: 0,id,title,original_language,genres,keywords
0,862,Toy Story,en,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...","[{'id': 931, 'name': 'jealousy'}, {'id': 4290,..."
1,8844,Jumanji,en,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...","[{'id': 10090, 'name': 'board game'}, {'id': 1..."
2,15602,Grumpier Old Men,en,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...","[{'id': 1495, 'name': 'fishing'}, {'id': 12392..."
3,31357,Waiting to Exhale,en,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...","[{'id': 818, 'name': 'based on novel'}, {'id':..."
4,11862,Father of the Bride Part II,en,"[{'id': 35, 'name': 'Comedy'}]","[{'id': 1009, 'name': 'baby'}, {'id': 1599, 'n..."


## 전처리

앞서 진행했던 전처리 방법 그대로 적용해서 전처리를 진행하겠습니다.  
위에서 설명이 쓰여져 있지만, 데이터 내부엔 list(dict()) 형태로 구성되어 있습니다. 또한 이 안에는 **문자열** 형태로 들어가있구요.  
이를 처리하기 위해서 ast의 literal_eval 함수를 사용합니다.

In [5]:
movie_data['genres'] = movie_data['genres'].apply(literal_eval)
movie_data['genres'] = movie_data['genres'].apply(lambda x : [d['name'] for d in x]).apply(lambda x : " ".join(x))

In [6]:
movie_data['keywords'] = movie_data['keywords'].apply(literal_eval)
movie_data['keywords'] = movie_data['keywords'].apply(lambda x : [d['name'] for d in x]).apply(lambda x : " ".join(x))

In [7]:
movie_data.head()

Unnamed: 0,id,title,original_language,genres,keywords
0,862,Toy Story,en,Animation Comedy Family,jealousy toy boy friendship friends rivalry bo...
1,8844,Jumanji,en,Adventure Fantasy Family,board game disappearance based on children's b...
2,15602,Grumpier Old Men,en,Romance Comedy,fishing best friend duringcreditsstinger old men
3,31357,Waiting to Exhale,en,Comedy Drama Romance,based on novel interracial relationship single...
4,11862,Father of the Bride Part II,en,Comedy,baby midlife crisis confidence aging daughter ...


### TF-IDF 벡터화

전처리한 데이터를 TF-IDF 방법을 이용해 벡터로 만들어줍니다.  
저는 장르와 keyword를 하나로 합친 후 tfidf vector로 만들어주었습니다.

In [8]:
tfidf_vector = TfidfVectorizer()
#tfidf_vector = TfidfVectorizer(ngram_range=(1,2))
tfidf_matrix = tfidf_vector.fit_transform(movie_data['genres'] + " " + movie_data['keywords']).toarray()
#tfidf_matrix = tfidf_vector.fit_transform(movie_data['genres']).toarray()
tfidf_matrix_feature = tfidf_vector.get_feature_names()

In [9]:
tfidf_matrix.shape

(32852, 11437)

In [10]:
tfidf_matrix = pd.DataFrame(tfidf_matrix, columns=tfidf_matrix_feature, index = movie_data.title)
print(tfidf_matrix.shape)
tfidf_matrix.head()

(32852, 11437)


Unnamed: 0_level_0,077,10,11,13,1500s,15th,16th,17th,1812,18th,...,βάφτηκε,γη,κόκκινο,το,χώμα,миньоны,卧底肥妈,绝地奶霸,自然界大事件,超级妈妈
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Toy Story,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Jumanji,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Grumpier Old Men,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Waiting to Exhale,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Father of the Bride Part II,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### 유사도 구하기

이렇게 만들어진 tf-idf vector를 코사인 유사도를 활용해서 유사도 값을 구해줍니다.  
이렇게 하면 영화 개수(n)만큼 n x n의 matirx 형태가 나오게 됩니다.

In [11]:
%%time
cosine_sim = cosine_similarity(tfidf_matrix)

CPU times: user 5min, sys: 11 s, total: 5min 11s
Wall time: 1min 5s


In [12]:
cosine_sim.shape

(32852, 32852)

In [13]:
cosine_sim_df = pd.DataFrame(cosine_sim, index = movie_data.title, columns = movie_data.title)
print(cosine_sim_df.shape)
cosine_sim_df.head()

(32852, 32852)


title,Toy Story,Jumanji,Grumpier Old Men,Waiting to Exhale,Father of the Bride Part II,Heat,Sabrina,Tom and Huck,Sudden Death,GoldenEye,...,Deep Hearts,The Morning After,House of Horrors,Shadow of the Blair Witch,The Burkittsville 7,Caged Heat 3000,Robin Hood,Betrayal,Satan Triumphant,Queerama
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Toy Story,1.0,0.041569,0.008708,0.006937,0.005595,0.0,0.006456,0.059202,0.0,0.0,...,0.0,0.05111,0.028298,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Jumanji,0.041569,1.0,0.0,0.065065,0.0,0.0,0.0,0.165721,0.028302,0.011462,...,0.0,0.0,0.039299,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Grumpier Old Men,0.008708,0.0,1.0,0.035846,0.010906,0.0,0.033363,0.0,0.0,0.0,...,0.0,0.099628,0.0,0.0,0.0,0.0,0.106819,0.0,0.0,0.0
Waiting to Exhale,0.006937,0.065065,0.035846,1.0,0.093741,0.003806,0.063686,0.027484,0.0,0.0,...,0.0,0.135728,0.0,0.0,0.0,0.0,0.121701,0.037622,0.0,0.0
Father of the Bride Part II,0.005595,0.0,0.010906,0.093741,1.0,0.0,0.038016,0.0,0.0,0.0,...,0.0,0.064015,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Content Based Recommend

이제 content base recsys 결과를 뽑아내기 위해 함수를 하나 만들어줍니다. 이 함수는 아래와 같은 기능을 담당합니다. 

- target title( 추천 결과를 조회할 영화 제목 )에 따라서 코사인 유사도를 구한 matrix에서 유사도 데이터를 가져옴
- 유사도 데이터 중 가장 유사도 값이 큰 데이터를 가져옴
    - 가져올 때 top k개를 가져옴
- 해당 추천 값 출력

In [14]:
def genre_recommendations(target_title, matrix, items, k=10):
    recom_idx = matrix.loc[:, target_title].values.reshape(1, -1).argsort()[:, ::-1].flatten()[1:k+1]
    recom_title = items.iloc[recom_idx, :].title.values
    recom_genre = items.iloc[recom_idx, :].genres.values
    target_title_list = np.full(len(range(k)), target_title)
    target_genre_list = np.full(len(range(k)), items[items.title == target_title].genres.values)
    d = {
        'target_title':target_title_list,
        'target_genre':target_genre_list,
        'recom_title' : recom_title,
        'recom_genre' : recom_genre
    }
    return pd.DataFrame(d)

In [15]:
genre_recommendations('The Dark Knight Rises', cosine_sim_df, movie_data)

Unnamed: 0,target_title,target_genre,recom_title,recom_genre
0,The Dark Knight Rises,Action Crime Drama Thriller,The Dark Knight,Drama Action Crime Thriller
1,The Dark Knight Rises,Action Crime Drama Thriller,The Burglar,Crime Drama
2,The Dark Knight Rises,Action Crime Drama Thriller,Batman Begins,Action Crime Drama
3,The Dark Knight Rises,Action Crime Drama Thriller,Batman & Robin,Action Crime Fantasy
4,The Dark Knight Rises,Action Crime Drama Thriller,Batman,Fantasy Action
5,The Dark Knight Rises,Action Crime Drama Thriller,Raffles,Adventure Comedy Crime Drama History Romance T...
6,The Dark Knight Rises,Action Crime Drama Thriller,Hero at Large,Action Comedy Drama
7,The Dark Knight Rises,Action Crime Drama Thriller,DC Showcase: Catwoman,Action Adventure Animation Science Fiction
8,The Dark Knight Rises,Action Crime Drama Thriller,DC Super Hero Girls: Hero of the Year,Animation
9,The Dark Knight Rises,Action Crime Drama Thriller,Batman Returns,Action Fantasy
