# Movie Recommendation

GroupLens Research has collected and made available rating data sets from the MovieLens web site (http://movielens.org). The data sets were collected over various periods of time, depending on the size of the set.

## Types of recommendation system
 - Popularity Based 

It keeps a track of view counts for each movie/video and then lists movies based on views in descending order.

 - Content Based
 
This type of recommendation systems, takes in a movie that a user currently likes as input. Then it analyzes the contents of the movie to find out other movies which have similar content. Then it ranks similar movies according to their similarity scores and recommends the most relevant movies to the user. 

 - Collaborative filtering

In other words, the recommendations get filtered based on the collaboration between similar user’s preferences.

In this notebook we are going to implement content based recommendation system.

## Importing libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from wordcloud import WordCloud
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

## Loading data files

### The data consists of 105339 ratings applied over 10329 movies.

The movies.csv dataset contains three columns:

 - movieId: the ID of the movie
 - title: movies title
 - genres: movies genres 
 
The ratings.csv dataset contains four columns:

 - userId: the ID of the user who rated the movie.
 - movieId: the ID of the movie
 - ratings: ratings given by each user (from 0 to 5)
 - Timstamp: The time the movie was rated. 
 

In [None]:
movies=pd.read_csv('../input/movies.csv')
ratings=pd.read_csv('../input/ratings.csv')

In [None]:
movies.info()

In [None]:
ratings.info()

In [None]:
movies.shape

In [None]:
ratings.shape

In [None]:
movies.describe()

In [None]:
ratings.describe()

 From the above table we can conclue that
 
 - The average rating is 3.5 and minimum and maximum rating is 0.5 and 5 respectively. 
 - There are 668 user who has given their ratings for 149532 movies.


In [None]:
genres=[]
for genre in movies.genres:
    
    x=genre.split('|')
    for i in x:
         if i not in genres:
            genres.append(str(i))
genres=str(genres)    
movie_title=[]
for title in movies.title:
    movie_title.append(title[0:-7])
movie_title=str(movie_title)    

## Data Visualization

In [None]:
wordcloud_genre=WordCloud(width=1500,height=800,background_color='black',min_font_size=2
                    ,min_word_length=3).generate(genres)
wordcloud_title=WordCloud(width=1500,height=800,background_color='cyan',min_font_size=2
                    ,min_word_length=3).generate(movie_title)

In [None]:
plt.figure(figsize=(30,10))
plt.axis('off')
plt.title('WORDCLOUD for Movies Genre',fontsize=30)
plt.imshow(wordcloud_genre)


In [None]:
plt.figure(figsize=(30,10))
plt.axis('off')
plt.title('WORDCLOUD for Movies title',fontsize=30)
plt.imshow(wordcloud_title)

In [None]:
df=pd.merge(ratings,movies, how='left',on='movieId')
df.head()

In [None]:
df1=df.groupby(['title'])[['rating']].sum()
high_rated=df1.nlargest(20,'rating')
high_rated.head()

In [None]:
plt.figure(figsize=(30,10))
plt.title('Top 20 movies with highest rating',fontsize=40)
colors=['red','yellow','orange','green','magenta','cyan','blue','lightgreen','skyblue','purple']
plt.ylabel('ratings',fontsize=30)
plt.xticks(fontsize=25,rotation=90)
plt.xlabel('movies title',fontsize=30)
plt.yticks(fontsize=25)
plt.bar(high_rated.index,high_rated['rating'],linewidth=3,edgecolor='red',color=colors)


In [None]:
df2=df.groupby('title')[['rating']].count()
rating_count_20=df2.nlargest(20,'rating')
rating_count_20.head()

In [None]:
plt.figure(figsize=(30,10))
plt.title('Top 20 movies with highest number of ratings',fontsize=30)
plt.xticks(fontsize=25,rotation=90)
plt.yticks(fontsize=25)
plt.xlabel('movies title',fontsize=30)
plt.ylabel('ratings',fontsize=30)

plt.bar(rating_count_20.index,rating_count_20.rating,color='red')

In [None]:
cv=TfidfVectorizer()
tfidf_matrix=cv.fit_transform(movies['genres'])

In [None]:
movie_user = df.pivot_table(index='userId',columns='title',values='rating')
movie_user.head()

Suppose a user wants to watch a movie similar to Toy Story (1995) then we can recommend the user by calculating the cosine similarity between Toy Story and other movies. So we have to first find the cosine similarity betw

In [None]:

cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)

In [None]:
indices=pd.Series(movies.index,index=movies['title'])
titles=movies['title']
def recommendations(title):
    idx = indices[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:21]
    movie_indices = [i[0] for i in sim_scores]
    return titles.iloc[movie_indices]

In [None]:
recommendations('Toy Story (1995)')