# Problem Statement
Develop a machine learning model that provides personalized movie recommendations to Netflix users. The system should analyze user preferences and viewing history to suggest movies that align with their tastes and interests. The goal is to enhance user engagement and satisfaction by presenting a curated list of movies that are likely to be of interest to each individual user."

Key aspects of this problem statement include:

**Personalization**: The system must tailor recommendations to individual users based on their unique preferences and viewing history.

**Data-Driven**: The recommendations should be based on data, such as user ratings, viewing patterns, and movie metadata (genres, directors, actors, etc.).

**User Engagement**: The primary objective is to increase user engagement by suggesting movies that users are likely to enjoy and watch.

**Scalability**: The system should be scalable, capable of handling a large number of users and movies.

**Performance Metrics**: The effectiveness of the system should be measurable using appropriate metrics (e.g., accuracy of recommendations, user satisfaction, click-through rates).

**Ethical Considerations**: The system should respect user privacy and comply with data usage regulations.

Uses machine learning algorithms to recommend movies and TV shows based on user preferences and viewing history.

# Understand the goals and expectations


The project's goal is to enhance Netflix's user experience by providing personalized movie recommendations, increasing engagement and retention. It focuses on scalable, efficient algorithms that respect user privacy, continuously improve through feedback, and positively impact business metrics, while innovating in machine learning and recommendation systems.

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns
import json
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity

In [2]:
# Input data files are available in the read-only "../kaggle/working" directory
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

/kaggle/input/tmdb-movie-metadata/tmdb_5000_movies.csv
/kaggle/input/tmdb-movie-metadata/tmdb_5000_credits.csv
/kaggle/input/netflix-prize-data/combined_data_3.txt
/kaggle/input/netflix-prize-data/movie_titles.csv
/kaggle/input/netflix-prize-data/combined_data_4.txt
/kaggle/input/netflix-prize-data/combined_data_1.txt
/kaggle/input/netflix-prize-data/README
/kaggle/input/netflix-prize-data/probe.txt
/kaggle/input/netflix-prize-data/combined_data_2.txt
/kaggle/input/netflix-prize-data/qualifying.txt


In [3]:
#df = pd.read_csv("/kaggle/input/netflix-prize-data/movie_titles.csv", encoding='cp1252', sep=',')
df = pd.read_csv("/kaggle/input/tmdb-movie-metadata/tmdb_5000_movies.csv")

In [4]:
df.head()

Unnamed: 0,budget,genres,homepage,id,keywords,original_language,original_title,overview,popularity,production_companies,production_countries,release_date,revenue,runtime,spoken_languages,status,tagline,title,vote_average,vote_count
0,237000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.avatarmovie.com/,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...",en,Avatar,"In the 22nd century, a paraplegic Marine is di...",150.437577,"[{""name"": ""Ingenious Film Partners"", ""id"": 289...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2009-12-10,2787965087,162.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}, {""iso...",Released,Enter the World of Pandora.,Avatar,7.2,11800
1,300000000,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...",http://disney.go.com/disneypictures/pirates/,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...",en,Pirates of the Caribbean: At World's End,"Captain Barbossa, long believed to be dead, ha...",139.082615,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2007-05-19,961000000,169.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"At the end of the world, the adventure begins.",Pirates of the Caribbean: At World's End,6.9,4500
2,245000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://www.sonypictures.com/movies/spectre/,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",en,Spectre,A cryptic message from Bond’s past sends him o...,107.376788,"[{""name"": ""Columbia Pictures"", ""id"": 5}, {""nam...","[{""iso_3166_1"": ""GB"", ""name"": ""United Kingdom""...",2015-10-26,880674609,148.0,"[{""iso_639_1"": ""fr"", ""name"": ""Fran\u00e7ais""},...",Released,A Plan No One Escapes,Spectre,6.3,4466
3,250000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...",http://www.thedarkknightrises.com/,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",en,The Dark Knight Rises,Following the death of District Attorney Harve...,112.31295,"[{""name"": ""Legendary Pictures"", ""id"": 923}, {""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-07-16,1084939099,165.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,The Legend Ends,The Dark Knight Rises,7.6,9106
4,260000000,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...",http://movies.disney.com/john-carter,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...",en,John Carter,"John Carter is a war-weary, former military ca...",43.926995,"[{""name"": ""Walt Disney Pictures"", ""id"": 2}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...",2012-03-07,284139100,132.0,"[{""iso_639_1"": ""en"", ""name"": ""English""}]",Released,"Lost in our world, found in another.",John Carter,6.1,2124


In [5]:
df.describe()

Unnamed: 0,budget,id,popularity,revenue,runtime,vote_average,vote_count
count,4803.0,4803.0,4803.0,4803.0,4801.0,4803.0,4803.0
mean,29045040.0,57165.484281,21.492301,82260640.0,106.875859,6.092172,690.217989
std,40722390.0,88694.614033,31.81665,162857100.0,22.611935,1.194612,1234.585891
min,0.0,5.0,0.0,0.0,0.0,0.0,0.0
25%,790000.0,9014.5,4.66807,0.0,94.0,5.6,54.0
50%,15000000.0,14629.0,12.921594,19170000.0,103.0,6.2,235.0
75%,40000000.0,58610.5,28.313505,92917190.0,118.0,6.8,737.0
max,380000000.0,459488.0,875.581305,2787965000.0,338.0,10.0,13752.0


In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4803 entries, 0 to 4802
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   budget                4803 non-null   int64  
 1   genres                4803 non-null   object 
 2   homepage              1712 non-null   object 
 3   id                    4803 non-null   int64  
 4   keywords              4803 non-null   object 
 5   original_language     4803 non-null   object 
 6   original_title        4803 non-null   object 
 7   overview              4800 non-null   object 
 8   popularity            4803 non-null   float64
 9   production_companies  4803 non-null   object 
 10  production_countries  4803 non-null   object 
 11  release_date          4802 non-null   object 
 12  revenue               4803 non-null   int64  
 13  runtime               4801 non-null   float64
 14  spoken_languages      4803 non-null   object 
 15  status               

In [7]:
df.columns

Index(['budget', 'genres', 'homepage', 'id', 'keywords', 'original_language',
       'original_title', 'overview', 'popularity', 'production_companies',
       'production_countries', 'release_date', 'revenue', 'runtime',
       'spoken_languages', 'status', 'tagline', 'title', 'vote_average',
       'vote_count'],
      dtype='object')

In [8]:
df.isnull().sum()

budget                     0
genres                     0
homepage                3091
id                         0
keywords                   0
original_language          0
original_title             0
overview                   3
popularity                 0
production_companies       0
production_countries       0
release_date               1
revenue                    0
runtime                    2
spoken_languages           0
status                     0
tagline                  844
title                      0
vote_average               0
vote_count                 0
dtype: int64

In [9]:
movies = df[['id', 'keywords', 'overview', 'title', 'genres']]

In [10]:
movies

Unnamed: 0,id,keywords,overview,title,genres
0,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...",Avatar,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam..."
1,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""..."
2,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...,Spectre,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam..."
3,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...,The Dark Knight Rises,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam..."
4,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca...",John Carter,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam..."
...,...,...,...,...,...
4798,9367,"[{""id"": 5616, ""name"": ""united states\u2013mexi...",El Mariachi just wants to play his guitar and ...,El Mariachi,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam..."
4799,72766,[],A newlywed couple's honeymoon is upended by th...,Newlyweds,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""..."
4800,231617,"[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam...","""Signed, Sealed, Delivered"" introduces a dedic...","Signed, Sealed, Delivered","[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam..."
4801,126186,[],When ambitious New York attorney Sam is sent t...,Shanghai Calling,[]


In [11]:
def extract_names(json_str):
    try:
        items = json.loads(json_str)
        names = [item['name'] for item in items]
        return ', '.join(names)
    except json.JSONDecodeError:
        return ''

In [12]:
movies.loc[:,'only_genres'] = movies['genres'].apply(extract_names)
movies.loc[:,'only_keywords'] = movies['keywords'].apply(extract_names)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movies.loc[:,'only_genres'] = movies['genres'].apply(extract_names)
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  movies.loc[:,'only_keywords'] = movies['keywords'].apply(extract_names)


In [13]:
movies = movies.copy()

In [14]:
movies.loc[:,'tags'] = movies['only_genres']+movies['overview']+movies['only_keywords']

In [15]:
movies

Unnamed: 0,id,keywords,overview,title,genres,only_genres,only_keywords,tags
0,19995,"[{""id"": 1463, ""name"": ""culture clash""}, {""id"":...","In the 22nd century, a paraplegic Marine is di...",Avatar,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","Action, Adventure, Fantasy, Science Fiction","culture clash, future, space war, space colony...","Action, Adventure, Fantasy, Science FictionIn ..."
1,285,"[{""id"": 270, ""name"": ""ocean""}, {""id"": 726, ""na...","Captain Barbossa, long believed to be dead, ha...",Pirates of the Caribbean: At World's End,"[{""id"": 12, ""name"": ""Adventure""}, {""id"": 14, ""...","Adventure, Fantasy, Action","ocean, drug abuse, exotic island, east india t...","Adventure, Fantasy, ActionCaptain Barbossa, lo..."
2,206647,"[{""id"": 470, ""name"": ""spy""}, {""id"": 818, ""name...",A cryptic message from Bond’s past sends him o...,Spectre,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","Action, Adventure, Crime","spy, based on novel, secret agent, sequel, mi6...","Action, Adventure, CrimeA cryptic message from..."
3,49026,"[{""id"": 849, ""name"": ""dc comics""}, {""id"": 853,...",Following the death of District Attorney Harve...,The Dark Knight Rises,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","Action, Crime, Drama, Thriller","dc comics, crime fighter, terrorist, secret id...","Action, Crime, Drama, ThrillerFollowing the de..."
4,49529,"[{""id"": 818, ""name"": ""based on novel""}, {""id"":...","John Carter is a war-weary, former military ca...",John Carter,"[{""id"": 28, ""name"": ""Action""}, {""id"": 12, ""nam...","Action, Adventure, Science Fiction","based on novel, mars, medallion, space travel,...","Action, Adventure, Science FictionJohn Carter ..."
...,...,...,...,...,...,...,...,...
4798,9367,"[{""id"": 5616, ""name"": ""united states\u2013mexi...",El Mariachi just wants to play his guitar and ...,El Mariachi,"[{""id"": 28, ""name"": ""Action""}, {""id"": 80, ""nam...","Action, Crime, Thriller","united states–mexico barrier, legs, arms, pape...","Action, Crime, ThrillerEl Mariachi just wants ..."
4799,72766,[],A newlywed couple's honeymoon is upended by th...,Newlyweds,"[{""id"": 35, ""name"": ""Comedy""}, {""id"": 10749, ""...","Comedy, Romance",,"Comedy, RomanceA newlywed couple's honeymoon i..."
4800,231617,"[{""id"": 248, ""name"": ""date""}, {""id"": 699, ""nam...","""Signed, Sealed, Delivered"" introduces a dedic...","Signed, Sealed, Delivered","[{""id"": 35, ""name"": ""Comedy""}, {""id"": 18, ""nam...","Comedy, Drama, Romance, TV Movie","date, love at first sight, narration, investig...","Comedy, Drama, Romance, TV Movie""Signed, Seale..."
4801,126186,[],When ambitious New York attorney Sam is sent t...,Shanghai Calling,[],,,When ambitious New York attorney Sam is sent t...


In [16]:
movies_df = movies.drop(columns=['keywords','overview','genres','only_genres','only_keywords'])

In [17]:
movies_df

Unnamed: 0,id,title,tags
0,19995,Avatar,"Action, Adventure, Fantasy, Science FictionIn ..."
1,285,Pirates of the Caribbean: At World's End,"Adventure, Fantasy, ActionCaptain Barbossa, lo..."
2,206647,Spectre,"Action, Adventure, CrimeA cryptic message from..."
3,49026,The Dark Knight Rises,"Action, Crime, Drama, ThrillerFollowing the de..."
4,49529,John Carter,"Action, Adventure, Science FictionJohn Carter ..."
...,...,...,...
4798,9367,El Mariachi,"Action, Crime, ThrillerEl Mariachi just wants ..."
4799,72766,Newlyweds,"Comedy, RomanceA newlywed couple's honeymoon i..."
4800,231617,"Signed, Sealed, Delivered","Comedy, Drama, Romance, TV Movie""Signed, Seale..."
4801,126186,Shanghai Calling,When ambitious New York attorney Sam is sent t...


In [18]:
cv = CountVectorizer(max_features=4803, stop_words='english')

In [19]:
vector = cv.fit_transform(movies_df['tags'].values.astype('U')).toarray()

In [20]:
vector

array([[0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       ...,
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0],
       [0, 0, 0, ..., 0, 0, 0]])

In [21]:
vector.shape

(4803, 4803)

In [22]:
similarity = cosine_similarity(vector)

In [23]:
similarity

array([[1.        , 0.06019293, 0.03671734, ..., 0.03439596, 0.01903467,
        0.        ],
       [0.06019293, 1.        , 0.02541643, ..., 0.04761905, 0.        ,
        0.        ],
       [0.03671734, 0.02541643, 1.        , ..., 0.02178551, 0.        ,
        0.        ],
       ...,
       [0.03439596, 0.04761905, 0.02178551, ..., 1.        , 0.06776309,
        0.04462107],
       [0.01903467, 0.        , 0.        , ..., 0.06776309, 1.        ,
        0.04938648],
       [0.        , 0.        , 0.        , ..., 0.04462107, 0.04938648,
        1.        ]])

In [24]:
distance = sorted(list(enumerate(similarity[2])), reverse=True, key=lambda vector:vector[1])
for i in distance[:5]:
    print(movies_df.iloc[i[0]].title)

Spectre
Never Say Never Again
Quantum of Solace
Die Another Day
Dr. No


In [25]:
def recommend(movie):
    index = movies_df[movies_df['title'] == movie].index[0]
    distance = sorted(list(enumerate(similarity[index])), reverse=True, key=lambda vector:vector[1])
    for i in distance[1:6]:
        print(movies_df.iloc[i[0]].title)

In [26]:
recommend('Avatar')

Moonraker
Aliens
Silent Running
Mission to Mars
Alien


In [27]:
recommend('John Carter')

Avatar
Mission to Mars
Star Trek: Insurrection
The Martian
Escape from Planet Earth


In [28]:
recommend('Shanghai Calling')

The Devil Wears Prada
The Out-of-Towners
I Am Sam
Me and Orson Welles
Home Alone 2: Lost in New York
