# What to watch on Netflix ?
Find similar movies / tv shows using text similarity techniques


Netflix is known for its strong recommendation engines. They use a mix of content-based and collaborative filtering models to recommend tv shows and movies. In this task, one can create a recommendation engine based on text/description similarity techniques.

In [1]:
import numpy as np
import pandas as pd

df = pd.read_csv('netflix_titles.csv')
df.head()

Unnamed: 0,show_id,type,title,director,cast,country,date_added,release_year,rating,duration,listed_in,description
0,81145628,Movie,Norm of the North: King Sized Adventure,"Richard Finn, Tim Maltby","Alan Marriott, Andrew Toth, Brian Dobson, Cole...","United States, India, South Korea, China","September 9, 2019",2019,TV-PG,90 min,"Children & Family Movies, Comedies",Before planning an awesome wedding for his gra...
1,80117401,Movie,Jandino: Whatever it Takes,,Jandino Asporaat,United Kingdom,"September 9, 2016",2016,TV-MA,94 min,Stand-Up Comedy,Jandino Asporaat riffs on the challenges of ra...
2,70234439,TV Show,Transformers Prime,,"Peter Cullen, Sumalee Montano, Frank Welker, J...",United States,"September 8, 2018",2013,TV-Y7-FV,1 Season,Kids' TV,"With the help of three human allies, the Autob..."
3,80058654,TV Show,Transformers: Robots in Disguise,,"Will Friedle, Darren Criss, Constance Zimmer, ...",United States,"September 8, 2018",2016,TV-Y7,1 Season,Kids' TV,When a prison ship crash unleashes hundreds of...
4,80125979,Movie,#realityhigh,Fernando Lebrija,"Nesta Cooper, Kate Walsh, John Michael Higgins...",United States,"September 8, 2017",2017,TV-14,99 min,Comedies,When nerdy high schooler Dani finally attracts...


In [2]:
df.describe(include=['object'])

Unnamed: 0,type,title,director,cast,country,date_added,rating,duration,listed_in,description
count,6234,6234,4265,5664,5758,6223,6224,6234,6234,6234
unique,2,6172,3301,5469,554,1524,14,201,461,6226
top,Movie,Tunnel,"Raúl Campos, Jan Suter",David Attenborough,United States,"January 1, 2020",TV-MA,1 Season,Documentaries,A surly septuagenarian gets another chance at ...
freq,4265,3,18,18,2032,122,2027,1321,299,3


In [4]:
nan = df.isnull()
nan.sum()

show_id            0
type               0
title              0
director        1969
cast             570
country          476
date_added        11
release_year       0
rating            10
duration           0
listed_in          0
description        0
dtype: int64

Description feature has no null

## Content Based Recommendation
Based on movie description only

In [5]:
from sklearn.feature_extraction.text import TfidfVectorizer
vectorizer = TfidfVectorizer(max_df=0.7, min_df = 1)
features_fit = vectorizer.fit(df['description'])
movies_features = features_fit.transform(df['description'])
movies_features.shape

(6234, 16411)

In [11]:
sample_user = df.sample(5, random_state=10)
user_list = sample_user['title'].tolist()
user_list_descr = sample_user['description'].tolist()
#mydf_index = df.index[df.isin(mydf)['title']].tolist() #obtain the indices where my titles are in master df
user_feat_mat = features_fit.transform(user_list_descr)
user_feat_mat.shape

(5, 16411)

In [13]:
from sklearn.metrics.pairwise import linear_kernel, pairwise_distances

# metrics include: [‘cityblock’, ‘cosine’, ‘euclidean’, ‘l1’, ‘l2’, ‘manhattan’].

similar = pairwise_distances(user_feat_mat, movies_features, metric='l2')
indices = similar.argsort()

for row in range(sample_user.shape[0]):
    print('Recommendations for', user_list[row])
    rec = df.title.iloc[indices[row,1:6]].tolist()
    print(pd.DataFrame(rec, columns=['title']))
    print('')

Recommendations for Virunga
                        title
0  Virunga: Gorillas in Peril
1     A.D. Kingdom and Empire
2      Great Yellowstone Thaw
3                   Surf's Up
4     The Siege of Jadotville

Recommendations for Chloe
                 title
0      Meditation Park
1  Love Me or Leave Me
2              Boy Bye
3          Gnome Alone
4      The Competition

Recommendations for Backcountry
                          title
0                    First Kill
1  Man vs Wild with Sunny Leone
2          Journey to Greenland
3               Shortcut Safari
4                 We Bare Bears

Recommendations for Rica, Famosa, Latina
                        title
0                  The L Word
1  Droppin' Cash: Los Angeles
2               Border Patrol
3                This Evening
4                  Black Rose

Recommendations for Earth to Echo
               title
0    See You in Time
1  The InBESTigators
2             Loaded
3  Kids on the Block
4   Caught on Camera

