# **Description:**
A Content-Based Movie Recommendation System is an intelligent algorithm designed to suggest movies to users based on the characteristics of the films they have previously liked or interacted with. This system operates by analyzing the attributes of the movies, such as genre, director, actors, plot summary, and other metadata, to build a profile of each movie.When a user expresses interest in certain films (through ratings, views, or likes), the recommendation system creates a user profile by extracting features from the movies they prefer. It then compares these features with other films in the database to recommend new movies that closely match the user's preferences.

# 1. Import Libraries

In [62]:
import numpy as np
import pandas as pd
import os
import ast
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from nltk.stem import PorterStemmer


# 2. Import Dataset
This dataset is of 2 files which contains movies names, id's, credits, generes,cast..etc. With all of these features we make content for every movie based on its features.
# Reference : 
https://www.kaggle.com/datasets/tmdb/tmdb-movie-metadata?select=tmdb_5000_movies.csv

Let's merge two dataset files on 'title' column as both the datasets contains title column as common.

In [63]:
movies=pd.read_csv("tmdb_5000_movies.csv")
credits=pd.read_csv("tmdb_5000_credits.csv")
print(movies.shape)
movies=movies.merge(credits,on='title')

(4803, 20)


In [64]:
movies.head()
print(movies.shape)

(4809, 23)


Let's consider only few columns which will be more helpful for content based because columns like budget..etc may not play any role in recommendation system for users.
Then we also remove unwanted columns which contain NaN values and duplicated rows.

In [65]:
movies=movies[['movie_id','title','overview','genres','keywords','cast','crew']]
movies.isnull().sum()
movies.dropna(inplace=True)
movies.shape
movies.duplicated

<bound method DataFrame.duplicated of       movie_id                                     title  \
0        19995                                    Avatar   
1          285  Pirates of the Caribbean: At World's End   
2       206647                                   Spectre   
3        49026                     The Dark Knight Rises   
4        49529                               John Carter   
...        ...                                       ...   
4804      9367                               El Mariachi   
4805     72766                                 Newlyweds   
4806    231617                 Signed, Sealed, Delivered   
4807    126186                          Shanghai Calling   
4808     25975                         My Date with Drew   

                                               overview  \
0     In the 22nd century, a paraplegic Marine is di...   
1     Captain Barbossa, long believed to be dead, ha...   
2     A cryptic message from Bond’s past sends him o...   
3    

# 3. Data Preprocessing

Let's build a function to take only required content from a dictionaries like genre and keywords.

In [66]:

def convert(text):
    l=[]
    for i in ast.literal_eval(text):
        l.append(i['name'])
    return l
movies['genres']=movies['genres'].apply(convert)
movies['keywords']=movies['keywords'].apply(convert)

Let's build a function to take only names of top 3 cast for a movie

In [67]:
def convert_cast(text):
    l=[]
    counter=0
    for i in ast.literal_eval(text):
        if  counter<3:
            l.append(i['name'])
        counter+=1
    return l
movies['cast']=movies['cast'].apply(convert_cast)

Let's build a function to fetch director for every film.

In [68]:
def fetch_director(text):
    l=[]
    for i in ast.literal_eval(text):
        if i['job']=='Director':
            l.append(i['name'])
            break
    return l
movies['crew']=movies['crew'].apply(fetch_director)
movies['overview']=movies['overview'].apply(lambda x:x.split())    

Let's build a function to remove unwanted space in rows because while vectorization they may also create effect

In [69]:
def remove_space(word):
    l=[]
    for i in word:
        l.append(i.replace(" ",""))
    return l
movies['cast']=movies['cast'].apply(remove_space)
movies['crew']=movies['crew'].apply(remove_space)
movies['genres']=movies['genres'].apply(remove_space)
movies['keywords']=movies['keywords'].apply(remove_space)


Let's add a new column in our movies dataset 'tags' which is combination of overview, genres,keywords, cast and crew. 

In [70]:
movies['tags']=movies['overview']+movies['genres']+movies['keywords']+movies['cast']+movies['crew']
new_df=movies[['movie_id','title','tags']]
new_df['tags']=new_df['tags'].apply(lambda x: " ".join(x))
new_df['tags']=new_df['tags'].apply(lambda x:x.lower())


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x: " ".join(x))
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(lambda x:x.lower())


In [71]:
movies.iloc[0]['tags']

['In',
 'the',
 '22nd',
 'century,',
 'a',
 'paraplegic',
 'Marine',
 'is',
 'dispatched',
 'to',
 'the',
 'moon',
 'Pandora',
 'on',
 'a',
 'unique',
 'mission,',
 'but',
 'becomes',
 'torn',
 'between',
 'following',
 'orders',
 'and',
 'protecting',
 'an',
 'alien',
 'civilization.',
 'Action',
 'Adventure',
 'Fantasy',
 'ScienceFiction',
 'cultureclash',
 'future',
 'spacewar',
 'spacecolony',
 'society',
 'spacetravel',
 'futuristic',
 'romance',
 'space',
 'alien',
 'tribe',
 'alienplanet',
 'cgi',
 'marine',
 'soldier',
 'battle',
 'loveaffair',
 'antiwar',
 'powerrelations',
 'mindandsoul',
 '3d',
 'SamWorthington',
 'ZoeSaldana',
 'SigourneyWeaver',
 'JamesCameron']

# 4. Stemming

This process helps to minimize text by removing suffixes and prefixes and make simple.So that it will be east to vectorize.

In [72]:
ps=PorterStemmer()
def stems(text):
    l=[]
    for i in text.split():
        l.append(ps.stem(i))
    return " ".join(l)
new_df['tags']=new_df['tags'].apply(stems)
new_df.iloc[0]['tags']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  new_df['tags']=new_df['tags'].apply(stems)


'in the 22nd century, a parapleg marin is dispatch to the moon pandora on a uniqu mission, but becom torn between follow order and protect an alien civilization. action adventur fantasi sciencefict cultureclash futur spacewar spacecoloni societi spacetravel futurist romanc space alien tribe alienplanet cgi marin soldier battl loveaffair antiwar powerrel mindandsoul 3d samworthington zoesaldana sigourneyweav jamescameron'

# 5. Vectorization

This method helps to convert text in to numbers so that model  understands the patterns behind the text.

In [73]:

cv=CountVectorizer(max_features=5000,stop_words='english')
vector=cv.fit_transform(new_df['tags']).toarray()

# 6. Check Similarity

In [74]:

similarity=cosine_similarity(vector)
print(similarity.shape)


(4806, 4806)


In [75]:
new_df[new_df['title']=='Spider-Man'].index[0]

159

# 7. Build Recommendation System

Let's recommend up to 5 movies 

In [76]:
def recommendation_system(movie):
    index=new_df[new_df['title']==movie].index[0]
    distances=sorted(list(enumerate(similarity[index])),reverse=True,key=lambda x:x[1])
    for i in distances[1:6]:
        print(new_df.iloc[i[0]].title)

# 8. Recommendations

In [77]:
recommendation_system('Spider-Man 2')

Spider-Man 3
Spider-Man
The Amazing Spider-Man
Iron Man 2
Superman


In [79]:
recommendation_system('The Dark Knight Rises')

The Dark Knight
Batman Returns
Batman
Batman Forever
Batman Begins


In [80]:
recommendation_system('The Lego Movie')

Curious George
Percy Jackson: Sea of Monsters
The Adventures of Rocky & Bullwinkle
Penguins of Madagascar
The Croods
