# Netflix Recommendation System

<img src='images/netflix.png' height=400 >

Netflix is a subscription-based streaming platform that allows users to watch movies and TV shows without advertisements. One of the reasons behind the popularity of Netflix is its recommendation system. Its recommendation system recommends movies and TV shows based on the user’s interest. If you are a Data Science student and want to learn how to create a Netflix recommendation system, this project is for you. This project will take you through how to build a Netflix recommendation system using Python.

In [2]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity


In [3]:
df = pd.read_csv("data/netflixData.csv", encoding='latin1')


## EDA - Exploratory Data Analysis

In [4]:
df.head()

Unnamed: 0,Show Id,Title,Description,Director,Genres,Cast,Production Country,Release Date,Rating,Duration,Imdb Score,Content Type,Date Added
0,cc1b6ed9-cf9e-4057-8303-34577fb54477,(Un)Well,This docuseries takes a deep dive into the luc...,,Reality TV,,United States,2020.0,TV-MA,1 Season,6.6/10,TV Show,
1,e2ef4e91-fb25-42ab-b485-be8e3b23dedb,#Alive,"As a grisly virus rampages a city, a lone man ...",Cho Il,"Horror Movies, International Movies, Thrillers","Yoo Ah-in, Park Shin-hye",South Korea,2020.0,TV-MA,99 min,6.2/10,Movie,"September 8, 2020"
2,b01b73b7-81f6-47a7-86d8-acb63080d525,#AnneFrank - Parallel Stories,"Through her diary, Anne Frank's story is retol...","Sabina Fedeli, Anna Migotto","Documentaries, International Movies","Helen Mirren, Gengher Gatti",Italy,2019.0,TV-14,95 min,6.4/10,Movie,"July 1, 2020"
3,b6611af0-f53c-4a08-9ffa-9716dc57eb9c,#blackAF,Kenya Barris and his family navigate relations...,,TV Comedies,"Kenya Barris, Rashida Jones, Iman Benson, Genn...",United States,2020.0,TV-MA,1 Season,6.6/10,TV Show,
4,7f2d4170-bab8-4d75-adc2-197f7124c070,#cats_the_mewvie,This pawesome documentary explores how our fel...,Michael Margolis,"Documentaries, International Movies",,Canada,2020.0,TV-14,90 min,5.1/10,Movie,"February 5, 2020"


In [5]:
df.shape

(5967, 13)

In [6]:
df.isnull().sum()

Show Id                  0
Title                    0
Description              0
Director              2064
Genres                   0
Cast                   530
Production Country     559
Release Date             3
Rating                   4
Duration                 3
Imdb Score             608
Content Type             0
Date Added            1335
dtype: int64

In [7]:
df = df[["Title", "Description", "Content Type", "Genres"]]

In [8]:
df.head()

Unnamed: 0,Title,Description,Content Type,Genres
0,(Un)Well,This docuseries takes a deep dive into the luc...,TV Show,Reality TV
1,#Alive,"As a grisly virus rampages a city, a lone man ...",Movie,"Horror Movies, International Movies, Thrillers"
2,#AnneFrank - Parallel Stories,"Through her diary, Anne Frank's story is retol...",Movie,"Documentaries, International Movies"
3,#blackAF,Kenya Barris and his family navigate relations...,TV Show,TV Comedies
4,#cats_the_mewvie,This pawesome documentary explores how our fel...,Movie,"Documentaries, International Movies"


In [9]:
df = df.dropna()

In [10]:
import neattext as nt
import neattext.functions as nfx


def clean_text(text):
    text = nfx.remove_special_characters(text)
    text = nfx.normalize(text)
    text = nfx.clean_text(text)
    return text

df["Title"] = df["Title"].apply(clean_text)


In [11]:
df.sample(10)

Unnamed: 0,Title,Description,Content Type,Genres
2676,lego marvel super heroes black panther,When Thanos joins forces with villains Killmon...,Movie,Children & Family Movies
3889,record youth,Two actors and a makeup artist fight to make t...,TV Show,"International TV Shows, Romantic TV Shows, TV ..."
4364,sparta,While investigating the mysterious death of a ...,TV Show,"Crime TV Shows, International TV Shows, TV Dramas"
3946,richie rich,"After turning his veggies into green energy, R...",TV Show,"Kids' TV, TV Comedies"
835,bulbbul,A child bride grows up to be an enigmatic woma...,Movie,"Horror Movies, International Movies"
2446,justice,A U.S. Marshal arrives at a small town in Neva...,Movie,Movies
1463,el dragn return warrior,To replace his grandfather as head of a cartel...,TV Show,"Crime TV Shows, International TV Shows, Spanis..."
139,princess christmas,"At the invitation of a relative, young Jules D...",Movie,"Children & Family Movies, Dramas, Romantic Movies"
4890,holiday movies,Unwrap the real stories behind these iconic Ch...,TV Show,Docuseries
5713,war machine,When a proud general is tasked with winning an...,Movie,"Comedies, Dramas"


In [12]:
df.isnull().sum()

Title           0
Description     0
Content Type    0
Genres          0
dtype: int64

In [13]:
feature = df["Genres"].tolist()
tfidf = text.TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(feature)
similarity = cosine_similarity(tfidf_matrix)

In [14]:
indices = pd.Series(df.index, index=df['Title']).drop_duplicates()

In [15]:
def recommend_movies(movie_title, similarity_matrix = similarity):
    if movie_title not in df['Title'].values:
        return "Başlık bulunamadı."
    
    idx = df[df['Title'] == movie_title].index[0]

    similarity_scores = list(enumerate(similarity_matrix[idx]))

    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    filtered_scores = [score for score in similarity_scores if df.iloc[score[0]]['Title'] != movie_title]
    similarity_scores = filtered_scores[:5]

    movie_indices = [i[0] for i in similarity_scores]
    recommended_movies = df['Title'].iloc[movie_indices]
    
    return recommended_movies

recommended_movies = recommend_movies('american')
print("Önerilen Filmler:")
print(recommended_movies)


Önerilen Filmler:
1885    grand army
5207       society
5499      trinkets
384         anne e
496           baby
Name: Title, dtype: object


In [16]:
df.Title.sample(5)

2121            hot date
742           bo burnham
485               baaghi
3167      monster island
4125    searching sheela
Name: Title, dtype: object

In [17]:
df = pd.read_csv("data/netflixData.csv", encoding='latin1')

In [18]:
df = df[["Title", "Description", "Content Type", "Genres"]]

In [19]:
df['Title'] = df['Title'].apply(lambda x: nfx.remove_special_characters(x)) 

In [20]:
df.Title.sample(5)

5136          The Princess Switch Switched Again
1882                           Grace and Frankie
1773    Garfunkel and Oates Trying to be Special
1600                     Fear Street Part 3 1666
5581                                    Uncorked
Name: Title, dtype: object

In [21]:
df.to_csv("data/netflix4app.csv", index=False)