<a href="https://colab.research.google.com/github/viram29/Projects/blob/main/Netflix_reco.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Netflix Recommendation System using Python**

Our goal is to help user find shows and movies to enjoy with minimal effort. Here's how the system works:

**User Interactions:** We analyze user's viewing history and how you've rated other titles on our platform.

**Similar Preferences:** We compare user's tastes and preferences with other members who have similar viewing habits.

**Title Information:** We consider various details about the titles, such as genre, categories, actors, and release year.

**Additional Factors:** We take into account factors like the time of day you watch, the devices you use, and user's viewing duration.

Our algorithms process all this information to provide personalized recommendations tailored just for you. We don't use demographic data like age or gender in our decision-making process.

If you can't find something you want to watch, you can always search our entire catalog. Our search results are influenced by the actions of other members who have entered similar queries, making it easier for you to discover new content.

Thank you for using our Netflix recommendation system. Sit back, relax, and enjoy your personalized viewing experience!



Data from [Kaggle: dated 2021](https://www.kaggle.com/datasets/satpreetmakhija/netflix-movies-and-tv-shows-2021)

In [104]:
import pandas as pd
import nltk
import re
from sklearn.feature_extraction import text
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import euclidean_distances
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [105]:
nltk.download('stopwords')
stemmer = nltk.SnowballStemmer("english")
from nltk.corpus import stopwords
import string
stopword=set(stopwords.words('english'))

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [106]:
file_path = '/content/netflixData.csv'
data = pd.read_csv(file_path)

In [107]:
data.head()

Unnamed: 0,Show Id,Title,Description,Director,Genres,Cast,Production Country,Release Date,Rating,Duration,Imdb Score,Content Type,Date Added
0,cc1b6ed9-cf9e-4057-8303-34577fb54477,(Un)Well,This docuseries takes a deep dive into the luc...,,Reality TV,,United States,2020.0,TV-MA,1 Season,6.6/10,TV Show,
1,e2ef4e91-fb25-42ab-b485-be8e3b23dedb,#Alive,"As a grisly virus rampages a city, a lone man ...",Cho Il,"Horror Movies, International Movies, Thrillers","Yoo Ah-in, Park Shin-hye",South Korea,2020.0,TV-MA,99 min,6.2/10,Movie,"September 8, 2020"
2,b01b73b7-81f6-47a7-86d8-acb63080d525,#AnneFrank - Parallel Stories,"Through her diary, Anne Frank's story is retol...","Sabina Fedeli, Anna Migotto","Documentaries, International Movies","Helen Mirren, Gengher Gatti",Italy,2019.0,TV-14,95 min,6.4/10,Movie,"July 1, 2020"
3,b6611af0-f53c-4a08-9ffa-9716dc57eb9c,#blackAF,Kenya Barris and his family navigate relations...,,TV Comedies,"Kenya Barris, Rashida Jones, Iman Benson, Genn...",United States,2020.0,TV-MA,1 Season,6.6/10,TV Show,
4,7f2d4170-bab8-4d75-adc2-197f7124c070,#cats_the_mewvie,This pawesome documentary explores how our fel...,Michael Margolis,"Documentaries, International Movies",,Canada,2020.0,TV-14,90 min,5.1/10,Movie,"February 5, 2020"


In [108]:
print(data.isnull().sum())

Show Id                  0
Title                    0
Description              0
Director              2064
Genres                   0
Cast                   530
Production Country     559
Release Date             3
Rating                   4
Duration                 3
Imdb Score             608
Content Type             0
Date Added            1335
dtype: int64


In [109]:
data = data[["Title", "Description", "Content Type", "Genres"]]
print(data.head())

                           Title  \
0                       (Un)Well   
1                         #Alive   
2  #AnneFrank - Parallel Stories   
3                       #blackAF   
4               #cats_the_mewvie   

                                         Description Content Type  \
0  This docuseries takes a deep dive into the luc...      TV Show   
1  As a grisly virus rampages a city, a lone man ...        Movie   
2  Through her diary, Anne Frank's story is retol...        Movie   
3  Kenya Barris and his family navigate relations...      TV Show   
4  This pawesome documentary explores how our fel...        Movie   

                                           Genres  
0                                      Reality TV  
1  Horror Movies, International Movies, Thrillers  
2             Documentaries, International Movies  
3                                     TV Comedies  
4             Documentaries, International Movies  


In [110]:
data_with_genres = pd.concat([data, genres_split], axis=1)
genre_counts = data_with_genres.pivot_table(index='Genres', columns='Content Type', values='Title', aggfunc='count')
print(genre_counts)

Content Type                                        Movie  TV Show
Genres                                                            
Action & Adventure                                   65.0      NaN
Action & Adventure, Anime Features, Children & ...    3.0      NaN
Action & Adventure, Anime Features, Classic Movies    2.0      NaN
Action & Adventure, Anime Features, Horror Movies     1.0      NaN
Action & Adventure, Anime Features, Internation...   31.0      NaN
...                                                   ...      ...
TV Horror, TV Mysteries, Teen TV Shows                NaN      1.0
TV Horror, Teen TV Shows                              NaN      2.0
TV Sci-Fi & Fantasy, TV Thrillers                     NaN      1.0
TV Shows                                              NaN      9.0
Thrillers                                            33.0      NaN

[433 rows x 2 columns]


In [111]:
data = data.dropna()

In [112]:
def clean(text):
    text = str(text).lower()
    text = re.sub('\[.*?\]', '', text)
    text = re.sub('https?://\S+|www\.\S+', '', text)
    text = re.sub('<.*?>+', '', text)
    text = re.sub('[%s]' % re.escape(string.punctuation), '', text)
    text = re.sub('\n', '', text)
    text = re.sub('\w*\d\w*', '', text)
    text = [word for word in text.split(' ') if word not in stopword]
    text=" ".join(text)
    text = [stemmer.stem(word) for word in text.split(' ')]
    text=" ".join(text)
    return text

In [113]:
data["Title"] = data["Title"].apply(clean)

In [114]:
print(data.Title.sample(10))

4050               safeti guarante
5322                         world
1621                    fightworld
1226                       deadcon
2224                  famili trust
1627                      find agn
1324                  disco dancer
1584    fate winx saga  afterparti
4355               southern surviv
4107                           say
Name: Title, dtype: object


In [115]:
feature = data["Genres"].tolist()
tfidf = text.TfidfVectorizer(stop_words="english")
tfidf_matrix = tfidf.fit_transform(feature)
similarity = cosine_similarity(tfidf_matrix)
#print(similarity)

In [116]:
indices = pd.Series(data.index,index=data['Title']).drop_duplicates()

In [117]:
def netflix_recommendation(title, similarity=similarity):
    # Ensure similarity is properly passed to the function
    if similarity is None:
        raise ValueError("Similarity was not found.")

    # Retrieve the index of the input movie title
    index = indices[title]

    # Get similarity scores between the input movie and all other movies
    similarity_scores = list(enumerate(similarity[index]))

    # Sort similarity scores in descending order
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)

    # Select top 10 similar movies
    similarity_scores = similarity_scores[0:10]

    # Get indices of the top similar movies
    movie_indices = [i[0] for i in similarity_scores]

    # Return titles of the top similar movies
    return data['Title'].iloc[movie_indices]

# Call the function with the title "champion"
print(netflix_recommendation("lie april"))


2009          heaven offici bless
2040                hi score girl
4570        teas master takagisan
5448                     toradora
5642                vampir knight
5670            violet evergarden
5672    violet evergarden special
5921                    lie april
2907                     maidsama
3171        month girl nozaki kun
Name: Title, dtype: object


In [118]:
# Concatenate all genres into a single string for each item
data["Genres"] = data["Genres"].apply(lambda x: ' '.join(x.split(', ')))
# Extract features (genres) as a list of strings
features = data["Genres"].tolist()
# Initialize TF-IDF vectorizer
tfidf = TfidfVectorizer(stop_words="english")
# Compute TF-IDF matrix
tfidf_matrix = tfidf.fit_transform(features)
# Compute Euclidean distance matrix
euclidean_distance = euclidean_distances(tfidf_matrix, tfidf_matrix)

In [119]:
# Define function for recommendation using Euclidean distance
def netflix_recommendation(title, similarity_matrix=euclidean_distance):
    # Retrieve the index of the input movie title
    index = indices[title]

    # Get similarity scores (Euclidean distances) between the input movie and all other movies
    similarity_scores = list(enumerate(similarity_matrix[index]))

    # Sort similarity scores in ascending order (since Euclidean distance is used)
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1])

    # Select top 10 similar movies
    similarity_scores = similarity_scores[1:11]  # Exclude the movie itself (at index 0)

    # Get indices of the top similar movies
    movie_indices = [i[0] for i in similarity_scores]

    # Return titles of the top similar movies
    return data['Title'].iloc[movie_indices]

# Call the function with the title "champion"
print(netflix_recommendation("lie april"))

2040                hi score girl
4570        teas master takagisan
5448                     toradora
5642                vampir knight
5670            violet evergarden
5672    violet evergarden special
5921                    lie april
2907                     maidsama
3171        month girl nozaki kun
79                               
Name: Title, dtype: object
