### **Movie Recommendation System**

**Objective** : The primary objective of this project is to develop a movie recommendation system that provides personalized movie suggestions to users based on their past viewing history, ratings, and preferences. The system aims to enhance the user experience by accurately predicting movies that users are likely to enjoy, leveraging collaborative filtering, content-based filtering, and hybrid recommendation techniques.


**Data Source** : YBI Foundation GitHub repository: Movies Recommendation Dataset

**Import Library**

In [None]:
import pandas as pd
import numpy as np

**Import Data**

In [None]:
df=pd.read_csv('https://github.com/YBI-Foundation/Dataset/raw/main/Movies%20Recommendation.csv')

**Describe Data**

In [None]:
df.head()

Unnamed: 0,Movie_ID,Movie_Title,Movie_Genre,Movie_Language,Movie_Budget,Movie_Popularity,Movie_Release_Date,Movie_Revenue,Movie_Runtime,Movie_Vote,...,Movie_Homepage,Movie_Keywords,Movie_Overview,Movie_Production_House,Movie_Production_Country,Movie_Spoken_Language,Movie_Tagline,Movie_Cast,Movie_Crew,Movie_Director
0,1,Four Rooms,Crime Comedy,en,4000000,22.87623,09-12-1995,4300000,98.0,6.5,...,,hotel new year's eve witch bet hotel room,It's Ted the Bellhop's first night on the job....,"[{""name"": ""Miramax Films"", ""id"": 14}, {""name"":...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,"[{'name': 'Allison Anders', 'gender': 1, 'depa...",Allison Anders
1,2,Star Wars,Adventure Action Science Fiction,en,11000000,126.393695,25-05-1977,775398007,121.0,8.1,...,http://www.starwars.com/films/star-wars-episod...,android galaxy hermit death star lightsaber,Princess Leia is captured and held hostage by ...,"[{""name"": ""Lucasfilm"", ""id"": 1}, {""name"": ""Twe...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,"[{'name': 'George Lucas', 'gender': 2, 'depart...",George Lucas
2,3,Finding Nemo,Animation Family,en,94000000,85.688789,30-05-2003,940335536,100.0,7.6,...,http://movies.disney.com/finding-nemo,father son relationship harbor underwater fish...,"Nemo, an adventurous young clownfish, is unexp...","[{""name"": ""Pixar Animation Studios"", ""id"": 3}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,"[{'name': 'Andrew Stanton', 'gender': 2, 'depa...",Andrew Stanton
3,4,Forrest Gump,Comedy Drama Romance,en,55000000,138.133331,06-07-1994,677945399,142.0,8.2,...,,vietnam veteran hippie mentally disabled runni...,A man with a low IQ has accomplished great thi...,"[{""name"": ""Paramount Pictures"", ""id"": 4}]","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]","The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,"[{'name': 'Alan Silvestri', 'gender': 2, 'depa...",Robert Zemeckis
4,5,American Beauty,Drama,en,15000000,80.878605,15-09-1999,356296601,122.0,7.9,...,http://www.dreamworks.com/ab/,male nudity female nudity adultery midlife cri...,"Lester Burnham, a depressed suburban father in...","[{""name"": ""DreamWorks SKG"", ""id"": 27}, {""name""...","[{""iso_3166_1"": ""US"", ""name"": ""United States o...","[{""iso_639_1"": ""en"", ""name"": ""English""}]",Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,"[{'name': 'Thomas Newman', 'gender': 2, 'depar...",Sam Mendes


In [None]:
df.describe()

Unnamed: 0,Movie_ID,Movie_Budget,Movie_Popularity,Movie_Revenue,Movie_Runtime,Movie_Vote,Movie_Vote_Count
count,4760.0,4760.0,4760.0,4760.0,4758.0,4760.0,4760.0
mean,2382.566387,29201290.0,21.59951,82637430.0,107.184111,6.113866,692.508403
std,1377.270159,40756200.0,31.887919,163055400.0,21.960332,1.141294,1235.007337
min,1.0,0.0,0.000372,0.0,0.0,0.0,0.0
25%,1190.75,925750.0,4.807074,0.0,94.0,5.6,55.0
50%,2380.5,15000000.0,13.119058,19447160.0,104.0,6.2,238.0
75%,3572.25,40000000.0,28.411929,93412760.0,118.0,6.8,740.25
max,4788.0,380000000.0,875.581305,2787965000.0,338.0,10.0,13752.0


In [None]:
df.columns

Index(['Movie_ID', 'Movie_Title', 'Movie_Genre', 'Movie_Language',
       'Movie_Budget', 'Movie_Popularity', 'Movie_Release_Date',
       'Movie_Revenue', 'Movie_Runtime', 'Movie_Vote', 'Movie_Vote_Count',
       'Movie_Homepage', 'Movie_Keywords', 'Movie_Overview',
       'Movie_Production_House', 'Movie_Production_Country',
       'Movie_Spoken_Language', 'Movie_Tagline', 'Movie_Cast', 'Movie_Crew',
       'Movie_Director'],
      dtype='object')

**Data Visualization**

Data visualization is not included here because the focus is on the recommendation system. It is not essential for this project but could be added for extra insights if required.

**Data Preprocessing**

In [None]:
df_features = df[['Movie_Genre', 'Movie_Keywords', 'Movie_Tagline', 'Movie_Cast', 'Movie_Director', 'Movie_Overview', 'Movie_Language']].fillna('')


In [None]:
df_features.shape

(4760, 7)

In [None]:
df_features.head()

Unnamed: 0,Movie_Genre,Movie_Keywords,Movie_Tagline,Movie_Cast,Movie_Director,Movie_Overview,Movie_Language
0,Crime Comedy,hotel new year's eve witch bet hotel room,Twelve outrageous guests. Four scandalous requ...,Tim Roth Antonio Banderas Jennifer Beals Madon...,Allison Anders,It's Ted the Bellhop's first night on the job....,en
1,Adventure Action Science Fiction,android galaxy hermit death star lightsaber,"A long time ago in a galaxy far, far away...",Mark Hamill Harrison Ford Carrie Fisher Peter ...,George Lucas,Princess Leia is captured and held hostage by ...,en
2,Animation Family,father son relationship harbor underwater fish...,"There are 3.7 trillion fish in the ocean, they...",Albert Brooks Ellen DeGeneres Alexander Gould ...,Andrew Stanton,"Nemo, an adventurous young clownfish, is unexp...",en
3,Comedy Drama Romance,vietnam veteran hippie mentally disabled runni...,"The world will never be the same, once you've ...",Tom Hanks Robin Wright Gary Sinise Mykelti Wil...,Robert Zemeckis,A man with a low IQ has accomplished great thi...,en
4,Drama,male nudity female nudity adultery midlife cri...,Look closer.,Kevin Spacey Annette Bening Thora Birch Wes Be...,Sam Mendes,"Lester Burnham, a depressed suburban father in...",en


In [None]:
x = (df_features['Movie_Genre'] + ' ' + df_features['Movie_Keywords'] + ' ' + df_features['Movie_Tagline'] + ' ' + df_features['Movie_Cast'] + ' ' + df_features['Movie_Director'] + ' ' + df_features['Movie_Overview'] + ' ' + df_features['Movie_Language'])
x

0       Crime Comedy hotel new year's eve witch bet ho...
1       Adventure Action Science Fiction android galax...
2       Animation Family father son relationship harbo...
3       Comedy Drama Romance vietnam veteran hippie me...
4       Drama male nudity female nudity adultery midli...
                              ...                        
4755    Horror  The hot spot where Satan's waitin'. Li...
4756    Comedy Family Drama  It’s better to stand out ...
4757    Thriller Drama christian film sex trafficking ...
4758    Family     After being estranged since her mot...
4759    Documentary music actors legendary perfomer cl...
Length: 4760, dtype: object

**Define Target Variable (y) and Feature Variables (X)**

In this project, we use all available text features to compute similarities and make recommendations. So, we are not defining target and feature variables as it is a recommendation system based on text similarity, not a supervised learning model. Thus, there is no need to split data into targets and features.

**Train Test Split**

A train-test split is not applicable here because we are not building a predictive model. Instead, we use the entire dataset to calculate similarity scores between movies. The whole dataset is necessary to ensure comprehensive recommendations based on text similarity.

**Modeling**

In [None]:
# Tokenizing and vectorizing the text data using TF-IDF
from sklearn.feature_extraction.text import TfidfVectorizer
tfidf = TfidfVectorizer(stop_words='english')
x = tfidf.fit_transform(x)

In [None]:
x.shape


(4760, 30166)

In [None]:
# Checking for similarity using cosine similarity matrix
from sklearn.metrics.pairwise import cosine_similarity
similarity = cosine_similarity(x)
similarity

array([[1.00000000e+00, 4.43841400e-04, 3.83960697e-04, ...,
        1.48477952e-02, 6.80032663e-04, 9.51435536e-04],
       [4.43841400e-04, 1.00000000e+00, 3.80946310e-04, ...,
        5.85859300e-04, 2.16523637e-02, 9.43966035e-04],
       [3.83960697e-04, 3.80946310e-04, 1.00000000e+00, ...,
        3.47674089e-03, 2.61553631e-02, 8.16611197e-04],
       ...,
       [1.48477952e-02, 5.85859300e-04, 3.47674089e-03, ...,
        1.00000000e+00, 3.58967658e-02, 1.25587058e-03],
       [6.80032663e-04, 2.16523637e-02, 2.61553631e-02, ...,
        3.58967658e-02, 1.00000000e+00, 1.44629982e-03],
       [9.51435536e-04, 9.43966035e-04, 8.16611197e-04, ...,
        1.25587058e-03, 1.44629982e-03, 1.00000000e+00]])

In [None]:
similarity.shape

(4760, 4760)

**Model Evaluation**

Evaluation of this model is not explicitly done because recommendation systems are inherently evaluated through the quality of the recommendations they provide. Since it is only concerned about the user satisfaction and relevance of suggestions, they serve as the best indicators of effectiveness.

**Prediction**

In [None]:
# Taking input from the user to recommend similar movies
input_movie = input('Enter your favourite movie name: ')
all_movies_title_list = df['Movie_Title'].tolist()

Enter your favourite movie name: itar waar


In [None]:
# Finding the close matches to the input movie name using difflib
import difflib
movie_recommendation = difflib.get_close_matches(input_movie, all_movies_title_list)
movie_recommendation

['Star Wars', 'Liar Liar']

In [None]:
close_match = movie_recommendation[0]
close_match

'Star Wars'

In [None]:
index_of_the_movie = df[df.Movie_Title == close_match]['Movie_ID'].values[0]
index_of_the_movie

2

In [None]:
# Computing recommendation scores based on similarity
recommendation_score = list(enumerate(similarity[index_of_the_movie]))
sorted_similar_movies = sorted(recommendation_score, key=lambda x: x[1], reverse=True)
sorted_similar_movies[:5]

[(2, 1.0000000000000007),
 (1714, 0.17892337645340617),
 (3471, 0.168431748909765),
 (247, 0.09842037870252437),
 (3181, 0.07815931887061667)]

In [None]:
# Display top 20 recommended movies
print('Top 20 Movies suggested for you:\n')
i = 1
for movie in sorted_similar_movies:
    index = movie[0]
    title_from_index = df[df.index == index]['Movie_Title'].values[0]
    if i < 21:
        print(f"{i}. {title_from_index}")
        i += 1

Top 20 Movies suggested for you:

1. Finding Nemo
2. Shark Tale
3. The Reef
4. Big Fish
5. Megamind
6. Deuce Bigalow: Male Gigolo
7. 20,000 Leagues Under the Sea
8. Spider-Man
9. The Cat in the Hat
10. Happy Feet
11. Treading Water
12. When Did You Last See Your Father?
13. East Is East
14. Death Sentence
15. Teenage Mutant Ninja Turtles III
16. Meet the Deedles
17. The Family Stone
18. Jaws: The Revenge
19. Ponyo
20. The Muse


**Explaination**

This project focuses on building a movie recommendation system using machine learning techniques to suggest movies based on various attributes. Here's an overview of the process:

**Import Library**: The project begins by importing essential libraries with alias names for data manipulation, text processing, and computing similarities which are pandas and numpy.

**Import Data**: The movie dataset is loaded from the Ybi GitHub repository, providing the foundation for developing the recommendation system.

**Describe Data**: To gain a better understanding of the dataset, the first few rows, summary statistics, and column names are examined using pandas functions such as head(), describe() and columns.

**Data Visualization**: Data visualization is not included in this project, as the primary focus is on developing the recommendation system. Visualization can be added later for additional insights if needed.

**Data Preprocessing**: Key columns such as 'Movie_Genre', 'Movie_Keywords', 'Movie_Tagline', 'Movie_Cast', 'Movie_Director', 'Movie_Overview', and 'Movie_Language' are selected for preprocessing. These columns offer rich textual information. By combining these columns into a single string for each movie, it becomes easier to compute similarities.

**Define Target Variable (y) and Feature Variables (X)**: Defining target and feature variables is not applicable in this context, as the project is based on text similarity rather than a supervised learning model.

**Train Test Split**: A train-test split is unnecessary because the entire dataset is used to calculate similarity scores and generate recommendations.

**Modeling**: The TF-IDF Vectorizer is utilized to convert the combined text data into numerical features by tokenizing the text in the first place and then transforming them into numerical vectors as the machine learning algorithms require numerical inputs. Here, in this project, it converts movie descriptions and related textual data into numerical vectors which helps in representing the textual data in a form that captures the significance of different terms. Once the text is vectorized using TF-IDF, cosine similarity is computed to find the similarity between different movies. This similarity measure helps in identifying and recommending movies that are similar to the user's favorite movie.

**Model Evaluation**: Traditional model evaluation metrics are not explicitly used. Instead, the quality of recommendations serves as the evaluation. User satisfaction and the relevance of the suggested movies are the best indicators of the system's effectiveness.

**Prediction**: Users can input their favorite movie, and the system finds movies that are most similar based on the computed similarities using the difflib library which gives the close match for the user's input among all the movie titles. Later, the 'recommendation_score' creates a list of tuples where each tuple contains a movie index and its similarity score with the input movie. and then we sort this list by similarity scores in descending order, ensuring that the most similar movies appear first making the selection of most relevant one easier for providing relevant recommendations.

In summary, this project showcases the creation of an effective movie recommendation system by utilizing textual data. By harnessing machine learning techniques and Python libraries, it provides users with tailored movie suggestions, resulting in a practical and interactive tool.