In this notebook I am gonna apply Demographic Filtering to recommend top movies which got more ratings to a user's feed 

irrespective of content present in the movie(like genes,caste etc) 

Raw ratings by user should not be used directly because there may be few users rating a particular movie and many users rating 

another movie.So, we need to use weighted ratings instead of raw ratings for our dataset and then recommend the top few. 

Weighted rating for a movie according to IMDB stadards is given by 
    
    W = (R*v+C*m/(v+m))

where R is avg rating obtained for that movie

v = number of ratings for that movie

C = avg rating obtained over entire dataset

m = minimum number of ratings to be considered for a particular movie 

In [17]:
#Lets start by loading required libraries
import numpy as np 
import pandas as pd

In [18]:
df = pd.read_csv('rating.csv')
names = pd.read_csv('movie.csv')

In [19]:
names

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
27273,131254,Kein Bund für's Leben (2007),Comedy
27274,131256,"Feuer, Eis & Dosenbier (2002)",Comedy
27275,131258,The Pirates (2014),Adventure
27276,131260,Rentun Ruusu (2001),(no genres listed)


In [20]:
#Lets calculate the avg rating obtained for the entire dataset
C = df['rating'].mean()
print(C)

3.5255285642993797


In [21]:
#Creating vote_count column to record number of votes received by each movie
name = names.copy()
l = [0]*len(name)
name['vote_count'] = l
p = list(name['movieId'])

In [22]:
def votecount(train):
    mode = dict(train['movieId'].value_counts())
    #print(len(mode))
    q = list(mode.keys())
    #print(q)
    for i in p:
        if(i in q):
            name.loc[name['movieId'] ==i,'vote_count'] = mode[i]
    
    return name
name = votecount(df)

In [23]:
name

Unnamed: 0,movieId,title,genres,vote_count
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,49695
1,2,Jumanji (1995),Adventure|Children|Fantasy,22243
2,3,Grumpier Old Men (1995),Comedy|Romance,12735
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,2756
4,5,Father of the Bride Part II (1995),Comedy,12161
...,...,...,...,...
27273,131254,Kein Bund für's Leben (2007),Comedy,1
27274,131256,"Feuer, Eis & Dosenbier (2002)",Comedy,1
27275,131258,The Pirates (2014),Adventure,1
27276,131260,Rentun Ruusu (2001),(no genres listed),1


Determine appropriate value of m which is minimum number of ratings the movie should have in order to be listed in chart We will use 85% as our  cutoff.In other words, the movie should have atleast 85% more votes than other movies present in dataset

In [24]:
m= name['vote_count'].quantile(0.85)
print(m)
movies = name.copy().loc[name['vote_count'] >= m]
movies

609.0


Unnamed: 0,movieId,title,genres,vote_count
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy,49695
1,2,Jumanji (1995),Adventure|Children|Fantasy,22243
2,3,Grumpier Old Men (1995),Comedy|Romance,12735
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance,2756
4,5,Father of the Bride Part II (1995),Comedy,12161
...,...,...,...,...
23665,112556,Gone Girl (2014),Drama|Thriller,1479
23680,112623,Dawn of the Planet of the Apes (2014),Sci-Fi,759
23736,112852,Guardians of the Galaxy (2014),Action|Adventure|Sci-Fi,2049
24387,115569,Nightcrawler (2014),Crime|Drama|Thriller,609


In [25]:
p = list(movies['movieId'])
l = [0]*len(movies)
movies['avg_rating'] = l

In [26]:
#To calculate average ratings obtained for each movie
def avgcount(df):
    for i in p:
        movies.loc[movies['movieId'] ==i , 'avg_rating'] = np.nanmean(np.array(df.loc[df['movieId'] ==i, 'rating' ])) 
    return movies
    
movies = avgcount(df)


In [27]:
def weighted_rating(movies, m=m, C=C):
    v = movies['vote_count']
    R = movies['avg_rating']
    # Calculation based on the IMDB formula
    return (v/(v+m) * R) + (m/(m+v) * C)


In [28]:
# Define a new feature 'score' and calculate its value with `weighted_rating()`
movies['score'] = movies.apply(weighted_rating, axis=1)

In [33]:
#Sort movies based on score calculated above
movies = movies.sort_values('score', ascending=False)

#Print the top 10 movies
movies[['title', 'vote_count', 'avg_rating', 'genres', 'score']].head(10)

Unnamed: 0,title,vote_count,avg_rating,genres,score
315,"Shawshank Redemption, The (1994)",63366,4.44699,Crime|Drama,4.438219
843,"Godfather, The (1972)",41355,4.364732,Crime|Drama,4.352553
49,"Usual Suspects, The (1995)",47006,4.334372,Crime|Mystery|Thriller,4.324027
523,Schindler's List (1993),50054,4.310175,Drama|War,4.300743
1195,"Godfather: Part II, The (1974)",27398,4.275641,Crime|Drama,4.25933
887,Rear Window (1954),17449,4.271334,Mystery|Thriller,4.246182
895,Casablanca (1942),24349,4.258327,Drama|Romance,4.240446
1935,Seven Samurai (Shichinin no samurai) (1954),11611,4.27418,Action|Adventure|Drama,4.23687
1169,One Flew Over the Cuckoo's Nest (1975),29932,4.248079,Drama,4.233671
737,Dr. Strangelove or: How I Learned to Stop Worr...,23220,4.247287,Comedy|War,4.228841


So, these movies appear in trending tab of these systems sorted by their score 