# **Movie recommendation Sytem**

# Objective

**Recommender System**:  movie recommendation system typically combines multiple data sources, such as user ratings, movie metadata, and external reviews. Using a combination of these datasets can help create a more accurate, diverse, and engaging recommendation engine.

**Content-Based Filtering**:  Recommending movies similar to those the user has liked in the past based on movie attributes (genre, director, actors, etc.).

**Collaborative Filtering**: Recommending movies based on the preferences of users who have similar tastes.

#Source = https://github.com/YBI-Foundation/Dataset

In [3]:
import pandas as pd
import numpy as np

In [41]:
data = pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/Movies%20Recommendation.csv')

In [52]:
data['Movie_Title'] = data['Movie_Title'].str.lower()

In [53]:
data.head(5)

Unnamed: 0,Movie_ID,Movie_Title,Movie_Genre,Movie_Language,Movie_Budget,Movie_Popularity,Movie_Release_Date,Movie_Revenue,Movie_Runtime,Movie_Vote,...,Movie_Homepage,Movie_Keywords,Movie_Overview,Movie_Production_House,Movie_Production_Country,Movie_Spoken_Language,Movie_Tagline,Movie_Cast,Movie_Crew,Movie_Director
0,1,four rooms,Crime Comedy,en,4000000,22.87623,09-12-1995,4300000,98.0,6.5,...,,hotel new year's eve witch bet hotel room,It's Ted the Bellhop's first night on the job....,"[{""name"": ""Miramax Films"", ""id"": 14}, {""name"":...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,"[{'name': 'Allison Anders', 'gender': 1, 'depa...",Allison Anders
1,2,star wars,Adventure Action Science Fiction,en,11000000,126.393695,25-05-1977,775398007,121.0,8.1,...,http://www.starwars.com/films/star-wars-episod...,android galaxy hermit death star lightsaber,Princess Leia is captured and held hostage by ...,"[{""name"": ""Lucasfilm"", ""id"": 1}, {""name"": ""Twe...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,"[{'name': 'George Lucas', 'gender': 2, 'depart...",George Lucas
2,3,finding nemo,Animation Family,en,94000000,85.688789,30-05-2003,940335536,100.0,7.6,...,http://movies.disney.com/finding-nemo,father son relationship harbor underwater fish...,"Nemo, an adventurous young clownfish, is unexp...","[{""name"": ""Pixar Animation Studios"", ""id"": 3}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton
3,4,forrest gump,Comedy Drama Romance,en,55000000,138.133331,06-07-1994,677945399,142.0,8.2,...,,vietnam veteran hippie mentally disabled runni...,A man with a low IQ has accomplished great thi...,"[{""name"": ""Paramount Pictures"", ""id"": 4}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,"[{'name': 'Alan Silvestri', 'gender': 2, 'depa...",Robert Zemeckis
4,5,american beauty,Drama,en,15000000,80.878605,15-09-1999,356296601,122.0,7.9,...,http://www.dreamworks.com/ab/,male nudity female nudity adultery midlife cri...,"Lester Burnham, a depressed suburban father in...","[{""name"": ""DreamWorks SKG"", ""id"": 27}, {""name""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes


In [43]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4760 entries, 0 to 4759
Data columns (total 21 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Movie_ID                  4760 non-null   int64  
 1   Movie_Title               4760 non-null   object 
 2   Movie_Genre               4760 non-null   object 
 3   Movie_Language            4760 non-null   object 
 4   Movie_Budget              4760 non-null   int64  
 5   Movie_Popularity          4760 non-null   float64
 6   Movie_Release_Date        4760 non-null   object 
 7   Movie_Revenue             4760 non-null   int64  
 8   Movie_Runtime             4758 non-null   float64
 9   Movie_Vote                4760 non-null   float64
 10  Movie_Vote_Count          4760 non-null   int64  
 11  Movie_Homepage            1699 non-null   object 
 12  Movie_Keywords            4373 non-null   object 
 13  Movie_Overview            4757 non-null   object 
 14  Movie_Pr

In [44]:
data.shape

(4760, 21)

In [45]:
data.columns

Index(['Movie_ID', 'Movie_Title', 'Movie_Genre', 'Movie_Language',
       'Movie_Budget', 'Movie_Popularity', 'Movie_Release_Date',
       'Movie_Revenue', 'Movie_Runtime', 'Movie_Vote', 'Movie_Vote_Count',
       'Movie_Homepage', 'Movie_Keywords', 'Movie_Overview',
       'Movie_Production_House', 'Movie_Production_Country',
       'Movie_Spoken_Language', 'Movie_Tagline', 'Movie_Cast', 'Movie_Crew',
       'Movie_Director'],
      dtype='object')

In [46]:
# Preprocess genres
data['Movie_Genre'] = data['Movie_Genre'].fillna('')  # Handle missing values
data['Movie_Genre'] = data['Movie_Genre'].apply(lambda x: ' '.join(x.split('|')))


In [47]:
from sklearn.feature_extraction.text import TfidfVectorizer

# Create TF-IDF matrix for genres
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf_vectorizer.fit_transform(data['Movie_Genre'])


In [48]:
from sklearn.metrics.pairwise import linear_kernel

# Compute cosine similarity matrix
cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)


In [54]:
def recommend_movies(movie_title, cosine_sim=cosine_sim):
    # Convert input movie title to lowercase
    movie_title = movie_title.lower()

    # Get index of the input movie
    idx = data.index[data['Movie_Title'] == movie_title].tolist()

    if not idx:
        return "Movie not found in the dataset."

    idx = idx[0]

    # Get similarity scores for all movies with the given movie
    sim_scores = list(enumerate(cosine_sim[idx]))

    # Sort movies based on similarity scores
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)

    # Get the indices of the top 30 most similar movies
    movie_indices = [i[0] for i in sim_scores if i[0] != idx][:30]

    # Return the top 30 most similar movies
    return data['Movie_Title'].iloc[movie_indices].tolist()


In [56]:
# Example usage
recommend_movies('star wars')  # Replace 'Movie A' with the movie title you want to get recommendations for


['the lost world: jurassic park',
 'avp: alien vs. predator',
 'independence day',
 'dune',
 'total recall',
 'the incredible hulk',
 'iron man',
 'captain america: the first avenger',
 'transformers',
 'the empire strikes back',
 'return of the jedi',
 'star wars: episode i - the phantom menace',
 'star wars: episode ii - attack of the clones',
 'star wars: episode iii - revenge of the sith',
 'the time machine',
 'transformers: revenge of the fallen',
 'steel',
 'mad max beyond thunderdome',
 'timeline',
 'iron man 2',
 'superman iv: the quest for peace',
 'star trek',
 'tron: legacy',
 'the blood of heroes',
 'the avengers',
 'six-string samurai',
 'damnation alley',
 'megaforce',
 'x-men',
 'transformers: dark of the moon']