<h1>Item-Based Collaborative Filtering</h1>
![Recommendation Systems Approaches](https://raw.githubusercontent.com/ziababar/recommender/master/images/Recommendation%20Systems.jpg)
A recommender that is based on the item-based collaborative filtering approach.

The recommender system will perform the following steps:
1. Retrieve item and activity data
2. Determine similar items
3. Generate recommendations
![Collaborative Filtering](https://raw.githubusercontent.com/ziababar/recommender/master/images/collaborative-filtering.png)

<h2>Step 1 - Retrieve Data</h2>
The first step would always be to gather the data and pull it into the programming environment.
For our use case, we download the MovieLens dataset containing three sets of data,

- Movie data containing a certain movie's information, such as movieID, release date, URL, genre details, and so on
- User data containing the user information, such as userID, age, gender, occupation, ZIP code, and so on
- Ratings data containing userID, itemID, rating, timestamp

In [None]:
# Import the libraries that are going to be used here
import pandas as pd
import numpy as np
import scipy
import sklearn

In [None]:
# User activity data
data_cols = ['user id','movie id','rating','timestamp']
df_u_data = pd.read_csv('/home/nbuser/library/dataset/u.data', header=None, sep='\t', names=data_cols, usecols=range(3), encoding='latin-1')
df_u_data = df_u_data.sort_values('user id', ascending=1)
df_u_data.columns
df_u_data.head(5)

In [None]:
# List of movie items
item_cols = ['movie id','movie title','release date', 'video release date','IMDb URL','unknown','Action', 'Adventure','Animation','Childrens','Comedy','Crime', 'Documentary','Drama','Fantasy','Film-Noir','Horror', 'Musical','Mystery','Romance ','Sci-Fi','Thriller', 'War' ,'Western']
df_u_item = pd.read_csv('/home/nbuser/library/dataset/u.item', header=None, sep='|', names=item_cols, usecols=range(2), encoding='latin-1')
df_u_item = df_u_item.sort_values('movie id', ascending=1)
df_u_item.columns
df_u_item.head(5)

<h2>Step 2 - Determine Similar Items</h2>

![Item-Based Collaborative Filtering](https://raw.githubusercontent.com/ziababar/recommender/master/images/item-based.png)

There are multiple ways to determine similarity between users or items. Some common approaches used in recommendation systems include,
 - Neighbourhood-based techniques
   - Euclidean distance
   - Cosine similarity
   - Jaccard similarity
   - Pearson correlation coefficient
 - Clustering techniques
   - K-means clustering
   
In this example, we'll be using Pearson correlation coefficient to determine similar items.

Merge the two dataframes into one single dataframe. This allows the depiction of all the transactional activity in one single dataframe, leading to better and faster analysis.

In [None]:
ratings = pd.merge(df_u_item, df_u_data)
ratings.head()

Now we'll pivot this table to construct a nice matrix of users and the movies they rated. NaN indicates missing data, or movies that a given user did not watch:

In [None]:
movieRatings = ratings.pivot_table(index=['user id'], columns=['movie title'], values='rating')
movieRatings.head()

<h2>Step 3 - Generate Recommendations</h2>

Let's extract a Series of users who rated Star Wars:

In [None]:
starWarsRatings = movieRatings['Star Wars (1977)']
starWarsRatings.head()

Compute the pairwise correlation of Star Wars' vector of user rating with every other movie. Once done, drop any results that have no data, and construct a new DataFrame of movies and their correlation score (similarity) to Star Wars

In [None]:
# Use Pandas corrwith function for correlation matrix construction
similarMovies = movieRatings.corrwith(starWarsRatings)
similarMovies = similarMovies.dropna()
df = pd.DataFrame(similarMovies)
df.head(10)

Sort the results by similarity score, which is supposed to provide the the movies most similar to Star Wars.

In [None]:
similarMovies.sort_values(ascending=False)

The results are not entirely accurate as the results reflect movies that have only been viewed by a handful of people who also happened to like Star Wars. Need to get rid of movies that were only watched by a few people that are producing spurious results.

In [None]:
# Construct a new DataFrame that computers total movie ratings, and averate movie ratings
import numpy as np
movieStats = ratings.groupby('movie title').agg({'rating': [np.size, np.mean]})
movieStats.head()

Let's get rid of any movies rated by fewer than 100 people, and check the top-rated ones that are left:

In [None]:
popularMovies = movieStats['rating']['size'] >= 100
movieStats[popularMovies].sort_values([('rating', 'mean')], ascending=False)[:15]

Join this data with our original set of similar movies to Star Wars:

In [None]:
df = movieStats[popularMovies].join(pd.DataFrame(similarMovies, columns=['similarity']))
df.head()

Sort these new results by similarity score.

In [None]:
df.sort_values(['similarity'], ascending=False)[:15]