# Building Simple Recommender (IMDB Top 250 Clone)

In this part we are building the 'Top Rated Movies' feature from IMDB website. This chart is considered to be simplest of the recommenders because it doesn't take into consideration the tastes of a particular user, nor does it try to deduce similarities between different movies.
In this part we will simply calculate the score for each movie based on a predefined metric and we'll output a sorted list of movies based on that score.

We will take a step further using the Full MovieLens dataset used in "Exploring Full MovieLens dataset" chapter and create the IMDB Top 250 Clone, also known as the simplest recommender.

In order to build the simple recommender, the steps are as follows:
- Choose a metric / score to rate the movies on
- Decide prerequisites for the movie to be featured on the chart
- Calculate the score for every movie that satisfies the conditions
- Output the list of movies in decreasing order of their scores

# The Metric

The metric is the numeric quantity based on which we rank movies. A movie is considered to be better than another movie if it has a higher metric score than the other movie. It is very important to consider a reliable metric for our recommendation system so that we ensure a good quality of recommendations.

The choice of a metric is arbitrary. One of the simplest metrics that we can use is the rating of the movie. However, this presents many disadvantages when choosing it. One of the reason is that it does not take into consideration the popularity of the movie.

Let's assume a movie is rated 9 by 100000 users while 100 users rate the movie as 9.5 stars. The movie rated with 9 will be placed bellow and doesn't bring a fair comparison. To put it another way, movies with very few ratings are not very reliable. A movie rated 10/10 by five users doesn't necessarily means that is a good movie.

To avoid the aforementioned issue we'll simply take the rating and the number of votes for each movie and simply apply the mathematical weight formula provided by IMDB:

Weighted Rating (WR) = (v/(v+m)*R) + (m/(v+m))*C

The following applies:
- v is the number of votes gathered by the movie
- m is the minimum number of votes required for the movie to be in the chart
- R is the mean rating of the movie
- C is the mean rating of all the movies in the dataset

Since we have data on vote_count and vote_average we can start creating the weighted rating forumula.

We can see that the IMDB weighted formula also has a variable m, which is required to compute its score. This variable is in place to make sure that only movies that meet a certain criteria are considered for the rankings, in this case the popularity of the movie.
Therefore, the value of m determines which movies qualifies to be in the chart and determines the final value of the score.

# The Prerequisites

The only thing we need to keep in mind is that the higher value of m, the higher the emphasis on the popularity of a movie, and therefore the higher selectivity will be as well for a movie.
For our recommender we'll as our value m, the number of votes garnered by the 80th percentile movie. Simply put, for a movie to be considered in the rankings, it must have garnered more votes than at least 80% of the movies present in our dataset.

In [3]:
# Import required libraries
import pandas as pd
import numpy as np

In [4]:
# Read movie_dataset.csv into pandas DataFrame
df = pd.read_csv('../datasets/movies_metadata.csv')

  interactivity=interactivity, compiler=compiler, result=result)


In [6]:
# Calculate the number of votes garnered by the 80th percentile
m = df['vote_count'].quantile(0.80)

In [7]:
m

50.0

We can see that only 20% of the movies have received more than 50 votes. Therefore, the value of m is 50.

Another prerequisite that we want in place is the runtime. We only consider movies that are greater than 45 minutes and less than 300 minutes in lenght. To do this we can simply use 'runtime' feature from our dataset and specify the aforementioned condition.

In [8]:
# Only consider movies longer than 45 minutes and shorter than 300 minutes.
qualified_movies = df[(df['runtime'] >= 45) & (df['runtime'] <= 300)]

In [9]:
# Only consider movies that have garnered more than m votes
qualified_movies = qualified_movies[qualified_movies['vote_count'] >= m]

In [10]:
# Inspect the number of movies that made the cut
qualified_movies.shape

(8963, 24)

We can see that from our dataset of over 45000 movies approximately 9000 movies or 20% made the cut.

# Calculating the score

Now that we have the prerequisites in place we need to calculate the mean rating of all the movies in the dataset

In [11]:
C = df['vote_average'].mean()

In [12]:
C

5.618207215134185

We can see that the average rating of a movie is approximately 5.6/10. Next, we define a function that computes the rating for a movie given its features and the values of m and C.

In [13]:
# Function to compute the IMDB weighted rating for each movie
def weighted_rating(x, m=m, C=C):
    v = x['vote_count']
    R = x['vote_average']
    # Compute the weighted socre
    return (v/(v+m) * R) + (m/(m+v) * C) 

The last step is to generate the score for each movie. For this we will set the axis to 1 so that we can set the function to each row in the dataset.

In [14]:
# Compute the score using the weighted_rating function 
qualified_movies['score'] = qualified_movies.apply(weighted_rating, axis=1)

# Generating the IMDB Top 250 Clone

In [19]:
qualified_movies.sort_values('score', ascending=False)[:25]

Unnamed: 0,adult,belongs_to_collection,budget,genres,homepage,id,imdb_id,original_language,original_title,overview,...,revenue,runtime,spoken_languages,status,tagline,title,video,vote_average,vote_count,score
10309,False,,13200000,"[{'id': 35, 'name': 'Comedy'}, {'id': 18, 'nam...",,19404,tt0112870,hi,Dilwale Dulhania Le Jayenge,"Raj is a rich, carefree, happy-go-lucky second...",...,100000000.0,190.0,"[{'iso_639_1': 'hi', 'name': 'हिन्दी'}]",Released,Come... Fall In Love,Dilwale Dulhania Le Jayenge,False,9.1,661.0,8.855148
314,False,,25000000,"[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name...",,278,tt0111161,en,The Shawshank Redemption,Framed in the 1940s for the double murder of h...,...,28341470.0,142.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Fear can hold you prisoner. Hope can set you f...,The Shawshank Redemption,False,8.5,8358.0,8.482863
834,False,"{'id': 230, 'name': 'The Godfather Collection'...",6000000,"[{'id': 18, 'name': 'Drama'}, {'id': 80, 'name...",http://www.thegodfather.com/,238,tt0068646,en,The Godfather,"Spanning the years 1945 to 1955, a chronicle o...",...,245066400.0,175.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,An offer you can't refuse.,The Godfather,False,8.5,6024.0,8.476278
40251,False,,0,"[{'id': 10749, 'name': 'Romance'}, {'id': 16, ...",https://www.funimationfilms.com/movie/yourname/,372058,tt5311514,ja,君の名は。,High schoolers Mitsuha and Taki are complete s...,...,355298300.0,106.0,"[{'iso_639_1': 'ja', 'name': '日本語'}]",Released,,Your Name.,False,8.5,1030.0,8.366584
12481,False,"{'id': 263, 'name': 'The Dark Knight Collectio...",185000000,"[{'id': 18, 'name': 'Drama'}, {'id': 28, 'name...",http://thedarkknight.warnerbros.com/dvdsite/,155,tt0468569,en,The Dark Knight,Batman raises the stakes in his war on crime. ...,...,1004558000.0,152.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Why So Serious?,The Dark Knight,False,8.3,12269.0,8.289115
2843,False,,63000000,"[{'id': 18, 'name': 'Drama'}]",http://www.foxmovies.com/movies/fight-club,550,tt0137523,en,Fight Club,A ticking-time-bomb insomniac and a slippery s...,...,100853800.0,139.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,Mischief. Mayhem. Soap.,Fight Club,False,8.3,9678.0,8.286216
292,False,,8000000,"[{'id': 53, 'name': 'Thriller'}, {'id': 80, 'n...",,680,tt0110912,en,Pulp Fiction,"A burger-loving hit man, his philosophical par...",...,213928800.0,154.0,"[{'iso_639_1': 'en', 'name': 'English'}, {'iso...",Released,Just because you are a character doesn't mean ...,Pulp Fiction,False,8.3,8670.0,8.284623
522,False,,22000000,"[{'id': 18, 'name': 'Drama'}, {'id': 36, 'name...",http://www.schindlerslist.com/,424,tt0108052,en,Schindler's List,The true story of how businessman Oskar Schind...,...,321365600.0,195.0,"[{'iso_639_1': 'de', 'name': 'Deutsch'}, {'iso...",Released,"Whoever saves one life, saves the world entire.",Schindler's List,False,8.3,4436.0,8.270109
23673,False,,3300000,"[{'id': 18, 'name': 'Drama'}]",http://sonyclassics.com/whiplash/,244786,tt2582802,en,Whiplash,"Under the direction of a ruthless instructor, ...",...,13092000.0,105.0,"[{'iso_639_1': 'en', 'name': 'English'}]",Released,The road to greatness can take you to the edge.,Whiplash,False,8.3,4376.0,8.269704
5481,False,,15000000,"[{'id': 14, 'name': 'Fantasy'}, {'id': 12, 'na...",http://movies.disney.com/spirited-away,129,tt0245429,ja,千と千尋の神隠し,A ten year old girl who wanders away from her ...,...,274925100.0,125.0,"[{'iso_639_1': 'ja', 'name': '日本語'}]",Released,The tunnel led Chihiro to a mysterious town...,Spirited Away,False,8.3,3968.0,8.266628


We can see that 'Dilwale Dulhania Le Jayenge' figures at the top of the list. We can also see that it has a noticeable smaller number of votes than the other Top 25 movies.