# The Science Behind Netflix Recommendations: Code-Along

One of the easiest libraries to use for recommendation systems is Surprise, which stands for **Simple Python RecommendatIon System Engine**. In this notebook, we'll code a recommendation system using the Surprise Library's Singular Value Decomposition algorithm!

In [None]:
# !pip install numpy
# !pip install pandas
# !pip install surprise

In [None]:
# Import libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from collections import Counter

from surprise import Dataset, Reader
from surprise import SVD
from surprise import accuracy
from surprise.model_selection import cross_validate, train_test_split

## 1. Reading in the data and a simple exploratory data analysis

In [None]:
df = pd.read_csv('ratings.csv') # pandas' read_csv function
print(df.shape) # how many rows, columns are in the dataframe
df.head(10) # previewing the first 10 rows 

### a) Ratings

In [None]:
# value_counts() shows us how many times each value appears in a column
ratings = df['rating'].value_counts()
ratings

In [None]:
# plot distribution in matplotlib
ratings_sorted = sorted(list(zip(ratings.index, ratings)))
plt.bar([r[0] for r in ratings_sorted], [r[1] for r in ratings_sorted], color='maroon')
plt.xlabel("Rating")
plt.ylabel("# of Ratings")
plt.title("Distribution of Ratings")
plt.show()

### 2. Users

In [None]:
print("Number of users: ", df.userId.nunique()) 
print("Average Number of Reviews per User: ", df.shape[0]/df.userId.nunique())

In [None]:
ratings_per_user = df['userId'].value_counts()
ratings_per_user = sorted(list(zip(ratings_per_user.index, ratings_per_user)))
plt.bar([r[0] for r in ratings_per_user], [r[1] for r in ratings_per_user], color='orange')
plt.xlabel("User IDs")
plt.ylabel("# of Reviews")
plt.title("Number of Reviews per User")
plt.show()

In [None]:
user_ratings = [r[1] for r in ratings_per_user]
sorted_user_ratings = sorted(Counter(user_ratings).items())
plt.bar([r[0] for r in sorted_user_ratings][:-1], [r[1] for r in sorted_user_ratings][:-1], color='green')
plt.xlabel("Number of Users")
plt.ylabel("# of Reviews")
plt.title("Distribution of Review Frequency per User")
plt.show()

### c) Movies

In [None]:
print("Number of movies: ", df.movieId.nunique())
print("Average Number of Reviews per Movie: ", df.shape[0]/df.movieId.nunique())

In [None]:
df['movieId'].value_counts().iloc[:10]

## 2. Implementing Surprise's SVD
To read more about SVD and its hyperparameters:
https://surprise.readthedocs.io/en/stable/matrix_factorization.html#surprise.prediction_algorithms.matrix_factorization.SVD

In [None]:
# for Surprise, we only need three columns from the dataset
data = df[['userId', 'movieId', 'rating']]
reader = Reader(line_format='user item rating', sep=',')
data = Dataset.load_from_df(data, reader=reader)

# train-test-split
trainset, testset = train_test_split(data, test_size=.2)

In [None]:
# instantiate SVD and fit the trainset
svd = SVD()
svd.fit(trainset)

In [None]:
predictions = svd.test(testset)
accuracy.rmse(predictions)

## 3. Making Predictions

In [None]:
# taking a look at the first 10 rows of our test set
predictions[:10]

In [None]:
print("Number of users: ", df.userId.nunique()) 
print("Number of movies: ", df.movieId.nunique()) 

In [None]:
user = 5
item = 100
svd.predict(user, item)