# Practical 8 (Part I) - Recommender System (Popularity-based Recommendation)

Simple recommenders offer generalized recommendations to every user, based on movie popularity and/or genre. The basic idea behind this system is that movies that are more popular and critically acclaimed will have a higher probability of being liked by the average audience. An example could be IMDB Top 250.

This practical helps you to learn how to build a basic model of simple recommender systems using the Movies Data set that is publicly available on Kaggle.

Reference:

Full dataset can be downloaded here: https://www.kaggle.com/rounakbanik/the-movies-dataset?select=movies_metadata.csv

The reference of this practical: https://www.datacamp.com/community/tutorials/recommender-systems-python


## Section 1 Data Preparation

"movies_metadata.csv" contains information on ~45,000 movies featured in the Full MovieLens dataset. Features include posters, backdrops, budget, genre, revenue, release dates, languages, production countries, and companies.

1. Let's load your movies metadata dataset into a pandas DataFrame:

In [1]:
# Import Pandas
import pandas as pd                                       #complete this

# Load Movies Metadata
metadata = pd.read_csv('movies_metadata.csv', low_memory=False)    #complete this

# Print the first three rows
metadata.head()

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,release_date,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count
0,False,"{'id': 10194, 'name': 'Toy Story Collection', ...",30000000,"[{'id': 16, 'name': 'Animation'}, {'id': 35, '...",http://toystory.disney.com/toy-story,862,tt0114709,en,Toy Story,"Led by Woody, Andy's toys live happily in his ...",...,1995-10-30,373554033.0,81.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,,Toy Story,False,7.7,5415.0
1,False,,65000000,"[{'id': 12, 'name': 'Adventure'}, {'id': 14, '...",,8844,tt0113497,en,Jumanji,When siblings Judy and Peter discover an encha...,...,1995-12-15,262797249.0,104.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Roll the dice and unleash the excitement!,Jumanji,False,6.9,2413.0
2,False,"{'id': 119050, 'name': 'Grumpy Old Men Collect...",0,"[{'id': 10749, 'name': 'Romance'}, {'id': 35, ...",,15602,tt0113228,en,Grumpier Old Men,A family wedding reignites the ancient feud be...,...,1995-12-22,0.0,101.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Still Yelling. Still Fighting. Still Ready for...,Grumpier Old Men,False,6.5,92.0
3,False,,16000000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,31357,tt0114885,en,Waiting to Exhale,"Cheated on, mistreated and stepped on, the wom...",...,1995-12-22,81452156.0,127.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Friends are the people who let you be yourself...,Waiting to Exhale,False,6.1,34.0
4,False,"{'id': 96871, 'name': 'Father of the Bride Col...",0,"[{'id': 35, 'name': 'Comedy'}]",,11862,tt0113041,en,Father of the Bride Part II,Just when George Banks has recovered from his ...,...,1995-02-10,76578911.0,106.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Just When His World Is Back To Normal... He's ...,Father of the Bride Part II,False,5.7,173.0


In [2]:
metadata.shape

(45466, 24)

There are 45466 rows and 24 columns

2.  let's calculate the value of the mean rating across all movies using the pandas .mean() function:

In [3]:
# Calculate mean of vote average column
C = metadata['vote_average'].mean() 
print(C) # get value C

5.618207215134185


From the above output, you can observe that the average rating of a movie is around 5.6 (on a scale of 10).

3. Next, let's calculate the number of votes, m, received by a movie in the 90th percentile.

In [4]:
# Calculate the minimum number of votes required to be in the chart, m

m = metadata['vote_count'].quantile(0.90)              # rate vote count to filter the movies
print(m)

160.0


4. Now we can simply use a greater than equal to condition to filter out movies having greater than equal to 160 vote counts.

In [5]:
q_movies = metadata.copy().loc[metadata['vote_count'] >= m]        #complete this
q_movies.shape

(4555, 24)

There are 4555 movies with at least 160 votes

## Section 2 Calculate the Weighted Rating

5. Next and the most important step is to calculate the weighted rating for each qualified movie. To do this, you will:

(i) Define a function, weighted_rating(); The formula is as follows:
![image.png](attachment:image.png)

(ii) Since we already have calculated m and C we will simply pass them as an argument to the function;

(iii) Then we will select the vote_count(v) and vote_average(R) column from the q_movies data frame;

(iv) Finally, we will compute the weighted average and return the result.

In [6]:
# Function that computes the weighted rating of each movie
def weighted_rating(x, m=m, C=C):
    v = x['vote_count']
    R = x['vote_average']
    # Calculation based on the IMDB formula
    return (v/(v+m) * R) + (m/(m+v) * C)                                            #complete this

6. Next, we generate a new feature "score" to store the weighted_rating for each movie

In [7]:
# Define a new feature 'score' and calculate its value with `weighted_rating()`
q_movies['weighted_rating'] = q_movies.apply(weighted_rating, axis=1)
q_movies['weighted_rating'].head()

0    7.640253
1    6.820293
4    5.660700
5    7.537201
8    5.556626
Name: weighted_rating, dtype: float64

7. Finally, let's sort the DataFrame in descending order based on the score feature column and output the title, vote count, vote average, and weighted rating (score) of the top 20 movies.

In [8]:
#Sort movies based on score calculated above
q_movies = q_movies.sort_values('weighted_rating', ascending=False)                                       #complete this

#Print the top 15 movies
q_movies[['title', 'vote_count', 'vote_average', 'weighted_rating']].head(20)

Unnamed: 0,title,vote_count,vote_average,weighted_rating
314,The Shawshank Redemption,8358.0,8.5,8.445869
834,The Godfather,6024.0,8.5,8.425439
10309,Dilwale Dulhania Le Jayenge,661.0,9.1,8.421453
12481,The Dark Knight,12269.0,8.3,8.265477
2843,Fight Club,9678.0,8.3,8.256385
292,Pulp Fiction,8670.0,8.3,8.251406
522,Schindler's List,4436.0,8.3,8.206639
23673,Whiplash,4376.0,8.3,8.205404
5481,Spirited Away,3968.0,8.3,8.196055
2211,Life Is Beautiful,3643.0,8.3,8.187171


This chart shows the top 20 popular movies with high rating. Simple recommender is suitable for new users who did not have any interaction history in the system.

Next. Let's explore Content-Based Filtering