# Content Based Movie Recommendation System

### E21022 - Ujjwal Shekhar
ujjwal.shekhar@praxis.ac.in

### Content Based Recommendation System : 

#### - This technique attempts to figure out what a user's favourite aspects of an item is and then recommends items that has those aspects. 


- Here we are working on content based recommender system for movies but we would be considering only one aspect of that, the genres. There can be various other aspects as well like a particular cast or some really famous director to name a few.

- Here, we're going to try to figure out the users' favorite genres from the movies based on the ratings given by him or her.

- We'll build the user profile of the user and understand which genres the user likes the most and which genres the least based on the ratings. 

- Based on this user profiling, we will try to recommend best 20 movies to the user.

### Importing Required Packages

In [1]:
# Dataframe manipulation library
import pandas as pd

# Math functions for square root
from math import sqrt

# For numpy array
import numpy as np

# For setting the working directory
import os

# To supress warnings
import warnings
warnings.filterwarnings('ignore')

In [2]:
# os.chdir(r'C:\Users\ujjwa\Desktop\Profile Content\ML Projects') # Changing current working directory

### Importing required files

In [3]:
# Importing required files

movies_df = pd.read_csv('movies.csv')

#### Looking at the movies data

In [4]:
# Checking the movies dataframe
movies_df.head()

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy


In [5]:
# Checking no of rows and columns
movies_df.shape

(34208, 3)

### Data preprocessing

The year in the title of the movie is of no use as such. Let's make a new column out of it.

In [6]:
# Using regular expressions to find a year stored between parentheses
# We specify the parantheses so we don't conflict with movies that have years in their titles
movies_df['year'] = movies_df.title.str.extract('(\(\d\d\d\d\))',expand=False)

# Removing the parentheses
movies_df['year'] = movies_df.year.str.extract('(\d\d\d\d)',expand=False)

# Removing the years from the 'title' column
movies_df['title'] = movies_df.title.str.replace('(\(\d\d\d\d\))', '')

# Using the strip function to remove any whitespace characters that may have appeared
movies_df['title'] = movies_df['title'].apply(lambda x: x.strip())
movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,Adventure|Animation|Children|Comedy|Fantasy,1995
1,2,Jumanji,Adventure|Children|Fantasy,1995
2,3,Grumpier Old Men,Comedy|Romance,1995
3,4,Waiting to Exhale,Comedy|Drama|Romance,1995
4,5,Father of the Bride Part II,Comedy,1995


Now stripping the genres based on '|' to form a list.

In [7]:
#Every genre is separated by |, using the split function on |
movies_df['genres'] = movies_df.genres.str.split('|')
movies_df.head()

Unnamed: 0,movieId,title,genres,year
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995
2,3,Grumpier Old Men,"[Comedy, Romance]",1995
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995
4,5,Father of the Bride Part II,[Comedy],1995


### Generating html file for the above data frame to be used in DEMD project for displaying the list of movies.

In [8]:
movies_html = movies_df.drop(["movieId","genres"],axis=1)
html = movies_html.to_html()
  
# write html to file
text_file = open("C:\\Users\\ujjwa\\Desktop\\Python\\DEMD\\DEMD_Project\\templates\\movies_list.html", "w", encoding="utf-8")
text_file.write(html)
text_file.close()

Now creating the one hot encoded version for the data frame. Here we will have a column for each genre, representing the genres for every single row of movies.

In [9]:
#Copying the movie dataframe into a new one.
moviesWithGenres_df = movies_df.copy()

#For every row in the dataframe, iterate through the list of genres and place a 1 into the corresponding column
for index, row in movies_df.iterrows(): # Gives index and all data for each row
    
    for genre in row['genres']: # Picking the list of genres for that particular movie/row from movies_df
        
        moviesWithGenres_df.at[index, genre] = 1 
        # Put '1' in the exact positions in the new copied dataframe 
        # creating new columns for each genres. Index would be same as movies_df.
        # The positions where '1' was not places would be na.    
        # Here the 1s represent that the particular movie belongs to that genre.
        
#Filling in the NaN values with 0 to show that a movie doesn't have that column's genre
moviesWithGenres_df = moviesWithGenres_df.fillna(0)
moviesWithGenres_df.head()

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir,(no genres listed)
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,3,Grumpier Old Men,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,5,Father of the Bride Part II,[Comedy],1995,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Content Based Recommendation

- Here, we're going to try to figure out the users' favorite genres from the movies based on ratings given by them.
- We'll build the user profile of the user and understand which genres the user likes the most and which genre the least. Based on this user profiling, we will try to recommend best 20 movies to the user.
- The recommendation would be based on the genres of the movies that he liked would be suggested to him/her.

Let's begin by creating an input user to recommend movies to: ('Active User')

(To add more movies, we can simply increase the movies in the userInput. We should write it in capital letters and if a movie starts with a "The", like "The Matrix" then write it in like this: 'Matrix, The' .)

### Creating the user input

In [10]:
# Creating the user input data frame (Considering only 5 movies here)

# Storing the details in a list of dictionaries.
# Here there should not be any repetition of movies. It would create problem while setting movieId as index later. 

userInput = [
            {'title':'Breakfast Club, The', 'rating':5},
            {'title':'Toy Story', 'rating':3},
            {'title':'Jumanji', 'rating':1},
            {'title':'Pulp Fiction', 'rating':5},
            {'title':'Mulan', 'rating':5},
            {'title':'Akira', 'rating':4.5}
            ] 

# Using the dictionary to create the data frame
inputMovies = pd.DataFrame(userInput)
inputMovies

Unnamed: 0,title,rating
0,"Breakfast Club, The",5.0
1,Toy Story,3.0
2,Jumanji,1.0
3,Pulp Fiction,5.0
4,Mulan,5.0
5,Akira,4.5


### Identifying the movies from user input and mapping it with the ratings

Now we have to find the movie id from movies data set and map this rating to that movie.
P.S :
- If the movie is not in our original dataset or is mis spelled then it could not be found and can't be proceeded further with that movie.

In [11]:
# Filtering out the movies by title
# Creating the list of movie titles in the inputMovies.
# Filtering the original movies_df based on titles present in the list

inputId = movies_df[movies_df['title'].isin(inputMovies['title'].tolist())]

# Merging it so we can get the movieId. It's implicitly merging it by title.
inputMovies = pd.merge(inputId, inputMovies)

inputMovies

Unnamed: 0,movieId,title,genres,year,rating
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,3.0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0
2,296,Pulp Fiction,"[Comedy, Crime, Drama, Thriller]",1994,5.0
3,1274,Akira,"[Action, Adventure, Animation, Sci-Fi]",1988,4.5
4,1907,Mulan,"[Adventure, Animation, Children, Comedy, Drama...",1998,5.0
5,90620,Mulan,"[Action, Adventure, Drama, Romance]",2009,5.0
6,1968,"Breakfast Club, The","[Comedy, Drama]",1985,5.0


In [12]:
# Dropping genres and year column that is not required.
inputMovies = inputMovies.drop(['genres','year'], axis = 1)

# Final input dataframe
inputMovies

Unnamed: 0,movieId,title,rating
0,1,Toy Story,3.0
1,2,Jumanji,1.0
2,296,Pulp Fiction,5.0
3,1274,Akira,4.5
4,1907,Mulan,5.0
5,90620,Mulan,5.0
6,1968,"Breakfast Club, The",5.0


#### If a movie we added above, is not here, then it might not be present in the original data frame or it might spelled wrong. We can check this manually to confirm. We should also check for capitalisation.


### Filtering these input movies from moviesWithGenres_df
- We will create a new data frame of it as userMovies

In [13]:
#Filtering out the movies from the input
# Creating list of movie id's from inputMovies that we mapped to original movies_df.
# Using this list we will filter the moviesWithGenres_df to have rows of only those 5 movies

userMovies = moviesWithGenres_df[moviesWithGenres_df['movieId'].isin(inputMovies['movieId'].tolist())]
userMovies

Unnamed: 0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir,(no genres listed)
0,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
293,296,Pulp Fiction,"[Comedy, Crime, Drama, Thriller]",1994,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1246,1274,Akira,"[Action, Adventure, Animation, Sci-Fi]",1988,1.0,1.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1824,1907,Mulan,"[Adventure, Animation, Children, Comedy, Drama...",1998,1.0,1.0,1.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
1885,1968,"Breakfast Club, The","[Comedy, Drama]",1985,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
18136,90620,Mulan,"[Action, Adventure, Drama, Romance]",2009,1.0,0.0,0.0,0.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### For creating user profile, we need only the genres.
- Removing the remaining columns.

In [14]:
#Resetting the index to avoid future issues. Index would not be required for this data frame in future.
# It's just for building user profile.
userMovies = userMovies.reset_index(drop=True)

# Dropping the other columns except different genre columns 
userGenreTable = userMovies.drop(['movieId','title','genres','year'], axis = 1)
userGenreTable

Unnamed: 0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir,(no genres listed)
0,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1.0,1.0,1.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,1.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


- From the userGenreTable, we can definitely tell which genres he has watched or based on these 5 movies, we can determine the genres to which these movies fall into.
- By applying the ratings as weight and aggregating the scores for each genre, we can identify which would be the ratings he like the most.

#### Now we can use the ratings for each movie and multiply it with the genres and sum up the values for a particular genre to get the weight of the genre. 

#### This way we can get the user profile and his interests in different genres of movies.

In [15]:
# Checking the input ratings for each movie
inputMovies['rating']

0    3.0
1    1.0
2    5.0
3    4.5
4    5.0
5    5.0
6    5.0
Name: rating, dtype: float64

## Creating the user profile

Now multiplying the rating with the genre table and summing up to get the genre weights for that particular user i.e creating a user profile.

- For multiplying the userGenreTable and inputMovies rating column, the dimension of them should match like, if one has n rows and the other should have n columns.
- Since, we want the final output as the weighted scores of all the genres, we will have to take transpose of it and multiply by ratings. This way we match the dimension requirement as well as the we get the desired output.
- It would be like the first genre Adventure field for each movies would be multiplied to corresponding ratings and summed up to give one value for Adventure genre. That would be the user preference for the genre Adventure. 
- This would continue for every single genre we had created a column of.

In [16]:
# Taking the dot produt to get weights
userProfile = userGenreTable.transpose().dot(inputMovies['rating'])

#The user profile
userProfile_df = pd.DataFrame(data=userProfile, columns=['Preferences'])
userProfile_df

Unnamed: 0,Preferences
Adventure,18.5
Animation,12.5
Children,9.0
Comedy,18.0
Fantasy,4.0
Romance,10.0
Drama,20.0
Action,9.5
Crime,5.0
Thriller,5.0


 ### This is the user profile we wanted.
 - Here we can see that this particular user likes comedy the most!
 - From this we can make sure we never suggest the user the movies which fall into the genres with have a score of 0 in the user profile. We can suggest the movies that fall into the genres with maximum scores.

## Recommendation based on user profile
- Now we will try to score every movie in our actual moviesWithgenre_df using this preference weights

In [17]:
#Now let's get the genres of every movie in our original dataframe and setting the movie id as index also

genreTable = moviesWithGenres_df.set_index(moviesWithGenres_df['movieId'])
genreTable

Unnamed: 0_level_0,movieId,title,genres,year,Adventure,Animation,Children,Comedy,Fantasy,Romance,...,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,1,Toy Story,"[Adventure, Animation, Children, Comedy, Fantasy]",1995,1.0,1.0,1.0,1.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,2,Jumanji,"[Adventure, Children, Fantasy]",1995,1.0,0.0,1.0,0.0,1.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,3,Grumpier Old Men,"[Comedy, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,4,Waiting to Exhale,"[Comedy, Drama, Romance]",1995,0.0,0.0,0.0,1.0,0.0,1.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,5,Father of the Bride Part II,[Comedy],1995,0.0,0.0,0.0,1.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
151697,151697,Grand Slam,[Thriller],1967,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
151701,151701,Bloodmoney,[(no genres listed)],2010,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
151703,151703,The Butterfly Circus,[Drama],2009,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
151709,151709,Zero,"[Drama, Sci-Fi]",2015,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


- We only need the genres for finding the preference of each movies based on user profile. So, dropping rest all the columns.

In [18]:
# Dropping the unnecessary columns

genreTable = genreTable.drop(['movieId','title','genres','year'], axis = 1)
genreTable.head()

Unnamed: 0_level_0,Adventure,Animation,Children,Comedy,Fantasy,Romance,Drama,Action,Crime,Thriller,Horror,Mystery,Sci-Fi,IMAX,Documentary,War,Musical,Western,Film-Noir,(no genres listed)
movieId,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
1,1.0,1.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.0,0.0,1.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [19]:
# Verifying the size 
genreTable.shape

(34208, 20)

- Now we have just the genres for all movies in our dataset with their index as movieId
- For calculating the scores for every movie, we will take the weighted average of every movie based on user profile. 
- For this we will need to take the dot product of the user profile with the genres table created in previous step. 
- It would result in a single score for every single movie based on which we can make the recommendation.

In [20]:
# Multiplying the genres table by the weights (user preferences) and then taking the weighted average.
# Here the genre table is from the actual dataframe not the one used for creating user profile with only 5 records.

recommendationList = ((genreTable*userProfile).sum(axis=1))/(userProfile.sum())
recommendationList

movieId
1         0.512397
2         0.260331
3         0.231405
4         0.396694
5         0.148760
            ...   
151697    0.041322
151701    0.000000
151703    0.165289
151709    0.202479
151711    0.000000
Length: 34208, dtype: float64

In [21]:
# Sort our recommendations in descending order
recommendationList = recommendationList.sort_values(ascending=False)

# Checking the result obtained
recommendationList.head()

movieId
26093     0.801653
1907      0.768595
27344     0.764463
108540    0.727273
4956      0.669421
dtype: float64

- Now we have all the movies in our database with their preference scores. 
- But we should make sure that we don't recommend the movies that the user has reviewed for or the input movies in this case.
- So, removing the input movies from the recommendation list.

### Removing input movies from recommendation list

In [22]:
# Removing the input movies

for id in inputId.movieId.tolist() :
    recommendationList.pop(id)

- Now we have the prefect list with the user preference scores for all the other movies. Now, we can sort this in descending order and suggest the top 20 movies to the user.

## The final recommendation

In [23]:
#The final recommendation table with top 20 movies

recommendationTable_df = movies_df.loc[movies_df['movieId'].isin(recommendationList.head(20).keys())]
recommendationTable_df.head()

Unnamed: 0,movieId,title,genres,year
4625,4719,Osmosis Jones,"[Action, Animation, Comedy, Crime, Drama, Roma...",2001
4861,4956,"Stunt Man, The","[Action, Adventure, Comedy, Drama, Romance, Th...",1980
4923,5018,Motorama,"[Adventure, Comedy, Crime, Drama, Fantasy, Mys...",1991
8605,26093,"Wonderful World of the Brothers Grimm, The","[Adventure, Animation, Children, Comedy, Drama...",1962
8710,26236,"White Sun of the Desert, The (Beloe solntse pu...","[Action, Adventure, Comedy, Drama, Romance, War]",1970


- Resetting the index, as it is not required now.

In [24]:
# Resetting the index.

recommendationTable_df = recommendationTable_df.reset_index(drop=True)
recommendationTable_df

Unnamed: 0,movieId,title,genres,year
0,4719,Osmosis Jones,"[Action, Animation, Comedy, Crime, Drama, Roma...",2001
1,4956,"Stunt Man, The","[Action, Adventure, Comedy, Drama, Romance, Th...",1980
2,5018,Motorama,"[Adventure, Comedy, Crime, Drama, Fantasy, Mys...",1991
3,26093,"Wonderful World of the Brothers Grimm, The","[Adventure, Animation, Children, Comedy, Drama...",1962
4,26236,"White Sun of the Desert, The (Beloe solntse pu...","[Action, Adventure, Comedy, Drama, Romance, War]",1970
5,27344,Revolutionary Girl Utena: Adolescence of Utena...,"[Action, Adventure, Animation, Comedy, Drama, ...",1999
6,42015,Casanova,"[Action, Adventure, Comedy, Drama, Romance]",2005
7,56152,Enchanted,"[Adventure, Animation, Children, Comedy, Fanta...",2007
8,64645,The Wrecking Crew,"[Action, Adventure, Comedy, Crime, Drama, Thri...",1968
9,69644,Ice Age: Dawn of the Dinosaurs,"[Action, Adventure, Animation, Children, Comed...",2009


### This the list of 20 movies that can be recommended to the user based on the his/her user profile that we managed to generate based on the ratings of the few movies.

### The recommendation would improve as the user gives in more and more reviews as it allows to generate better user profile which in turn help us to recommend more relevant movies.

### Advantages and Disadvantages of Content-Based Filtering

#### Advantages

1) Learns user's preferences very well from already available data.

2) Can give highly personalized recommendation for users.


#### Disadvantages

1) Doesn't take into account what others think of the movie, so low quality movies could also show up in the recommendations just based on labels.

2) Extracting all these data is not always feasible.

3) Determining what characteristics of the movies the user dislikes or likes is not always possible.

4) Would fail to recommend a new genre if the user never ventured into it and liked.

### E21022