# Codebuster STAT 535 Statistical Computing Project
## Movie recommendation recommendation pipeline

#### Goal
Build a small real world deployment pipeline like it can be used in netflix / amazon

## Approach

Deploy three different classifiers / recommenders, based on quality of user profile:

1) Case 0 rated movies: Supervised prediction with just user age, gender, and year of the movie

 - In case of cold-start: No user information available

2) Case < 20 rated movies: Content-based recommender system

 - Content-based recommendation information about users and their taste. As we can see in the preprocessing, most of the users only rated one to five movies, implying that we have incomplete user-profiles. I think content-based recommendation makes sense here, because we can recommend similar movies, but not other categories that a user might like because we can not identify similar users with an incomplete user profile.

3) Case >=20 rated movies:  Collaborative recommender system

 - Collaborative filtering makes sense if you have a good user profile, which we assume we have if a user rated more or equal than 10 movies. With a good user profile we can identify similar users and make more sophisticated recommendations, e.g. movies from other genres.

#### Dataset
We use the movie recommendation lab dataset, which is a subset of the MovieLens dataset (https://grouplens.org/datasets/movielens/).




## Literature

- https://users.ece.cmu.edu/~dbatra/publications/assets/goel_batra_netflix.pdf
- http://delivery.acm.org/10.1145/1460000/1454012/p11-park.pdf?ip=72.19.68.210&id=1454012&acc=ACTIVE%20SERVICE&key=73B3886B1AEFC4BB%2EB478147E31829731%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&__acm__=1543416754_7f92e0642e26e7ea732886879096c704
- https://www.kaggle.com/prajitdatta/movielens-100k-dataset/kernels
- https://medium.com/@james_aka_yale/the-4-recommendation-engines-that-can-predict-your-movie-tastes-bbec857b8223
- https://www.kaggle.com/c/predict-movie-ratings
- https://cseweb.ucsd.edu/classes/wi17/cse258-a/reports/a048.pdf
- https://github.com/neilsummers/predict_movie_ratings/blob/master/movieratings.py
- https://medium.com/@connectwithghosh/recommender-system-on-the-movielens-using-an-autoencoder-using-tensorflow-in-python-f13d3e8d600d
### A few more
- https://sci2s.ugr.es/keel/pdf/specific/congreso/xia_dong_06.pdf (Uses SMV for classification, then MF for recommendation)
- https://www.kaggle.com/rounakbanik/movie-recommender-systems (Employs at least three Modules for recommendation)
- http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.703.4954&rep=rep1&type=pdf (Close to what we need, but a little too involving)
- https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0165868 (Uses SVM and correlation matrices...I have already tried the correlation approach, looks quite good, but how to quantify accuracy?)
- https://www.quora.com/How-do-we-use-SVMs-in-a-collaborative-recommendation (A good thread on SVM)
-http://www.quuxlabs.com/blog/2010/09/matrix-factorization-a-simple-tutorial-and-implementation-in-python/ (A good tutorial on matrix factorizasion)

In [29]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

from scipy import interp

from sklearn.naive_bayes import GaussianNB
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import linear_kernel

from sklearn.model_selection import cross_val_predict, cross_val_score, cross_validate, StratifiedKFold
from sklearn.metrics import classification_report,confusion_matrix, roc_curve, auc

## Data import and preprocessing

In [24]:
# all data
df = pd.read_csv("allData.tsv", sep='\t')
print(f"Shape: {df.shape}")
df.head(1)

Shape: (31620, 10)


Unnamed: 0,userID,age,gender,movieID,name,year,genre1,genre2,genre3,rating
0,747,1,F,1193,One Flew Over the Cuckoo's Nest,1975,Drama,,,5


In [25]:
# movie data
movies = pd.read_csv("movies.tsv", sep='\t')
print(f"Shape: {df.shape}")
movies.head(1)

Shape: (31620, 10)


Unnamed: 0,movieID,name,year,genre1,genre2,genre3
0,1,Toy Story,1995,Animation,Children's,Comedy


In [26]:
df_ratings = pd.read_csv('ratings.csv')
df_ratings.head(1)

Unnamed: 0,userID,movieID,rating
0,747,1193,5


##### Transform numerical rating to binary
- 1, if user rates movie 4 or 5
- 0, if user rates movie less than 4

In [27]:
df['rating'].mask(df['rating'] < 4, 0, inplace=True)
df['rating'].mask(df['rating'] > 3, 1, inplace=True)

## 1) Classifier without specific user-information
Scott's part with additions from Tobi

In [None]:
def recommendation_without_user_info(df, age, gender, number_recommendations):
    '''
    function returns 5 random movies which have been recommended by a gradient boosting classifier
    without user information
    
    @param df: movie dataset 'allData'
    @param age: user age
    @param gender: user gender
    @param number_recommendations: number of recommendations returned
    '''
    # fit
    # ---------------------------------------------------
    # User information before any movie ratings
    X = df[['age', 'gender', 'year', 'genre1', 'genre2', 'genre3']]
    y = df['rating'].as_matrix()

    # Preprocessing
    # One hot encoding
    dummyvars = pd.get_dummies(X[['gender', 'genre1', 'genre2', 'genre3']])
    # append the dummy variables to df
    X = pd.concat([X[['age', 'year']], dummyvars], axis = 1).as_matrix()

    print("GradientBoostingClassifier")
    gbclf = GradientBoostingClassifier(n_estimators=100)
    gbclf.fit(X=X, y=y)
    
    # predict
    # ---------------------------------------------------
    # concat user age and gender with movie information, and make predictions
    # e.g. user age 25 and male
    X = df[['age', 'gender', 'year', 'genre1', 'genre2', 'genre3']]
    # set age
    X['age'] = age
    dummyvars = pd.get_dummies(X[['gender', 'genre1', 'genre2', 'genre3']])
    # set gender
    dummyvars['gender_F'] = 0
    dummyvars['gender_M'] = 0
    if gender=='M':
        dummyvars['gender_M'] = 1
    elif gender=='F':
        dummyvars['gender_F'] = 1
    # append the dummy variables to df
    X = pd.concat([X[['age', 'year']], dummyvars], axis = 1).as_matrix()

    # make predictions
    y_pred = gbclf.predict(X=X)
    
    # concat predictions to movie information
    df_pred = pd.concat([df[['movieID', 'name']], pd.DataFrame(y_pred, index=df.index, columns=['pred_rating'])], axis = 1)
    # shuffle 5 random movies with rating 1
    df_pred = df_pred[df_pred.pred_rating==1]
    recommendation = df_pred.drop('pred_rating', axis=1).sample(number_recommendations, random_state=10).set_index('movieID')
    
    return recommendation

# test function
print(recommendation_without_user_info(df=df, age=25, gender='F', number_recommendations=5))
    

## 2) Content-based recommender
Tobias' part

In [28]:
# find out from user input a movie that he rated
def get_user_movie(df, user_ID):
    '''
    returns a rated movie from userID
    
    @param df: movie dataset 'allData'
    @param user_id: target user_ID
    '''
    # return data from random sampled row of user
    df_liked = df[df.rating==1]
    movie = df[df['userID']==747].sample(1).name
    
    # strip space at the end before return
    return movie.item().rstrip()

# Function that get movie recommendations based on the cosine similarity score of movie genres
def content_based_recommendation(movies, name, number_recommendations):
    '''
    Recommends number of similar movie based on movie title and similarity to movies in movie database
    
    @param movies: pandas dataframe with movie dataset with columns (movieID, name, genres_concat)
    @param name: movie title as string
    @param number_recommendations: number of recommendations returned as integer
    '''
    # fit
    # ---------------------------------------------------
    # Preprocessing for tf-idf vectorization
    # Strip space at the end of string
    movies['name'] = movies['name'].str.rstrip()
    # Concat genres into one string
    movies['genres_concat'] = movies[['genre1', 'genre2', 'genre3']].astype(str).apply(' '.join, axis=1)
    # Remove nans in string and strip spaces at the end
    movies['genres_concat'] = movies['genres_concat'].str.replace('nan','').str.rstrip()

    # Create tf_idf matrix sklearn TfidfVectorizer
    tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
    tfidf_matrix = tf.fit_transform(movies['genres_concat'])
    
    # calculate similarity matrix with cosine distance of tf_idf values
    cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)
    
    # Build a 1-dimensional array with movie titles
    indices = pd.Series(movies.index, index=movies['name'])
    
    # predict
    # ---------------------------------------------------
    # Ranks movies according to similarity to requested movie
    idx = indices[name]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:(number_recommendations+1)]
    movie_indices = [i[0] for i in sim_scores]
    return movies.name.iloc[movie_indices]

# get a movie that a user liked from user_ID provided
movie_name = get_user_movie(df=df, user_ID=802)
# make recommendation based on previous liked movie
content_based_recommendation(movies=movies, name=movie_name, number_recommendations=5)

185                   Pinocchio
689                       Mulan
755              101 Dalmatians
759    Rescuers Down Under, The
760               Rescuers, The
Name: name, dtype: object

## 3) Collaborative recommender
Sixtus' part

##### There are two approaches

##### 1. Memory based approach: 
This can be divided into two, (i) user-item filtering and (ii) item-item filtering.

In this, we basically calculate the closest the user or item using the Cosine similarity or Pearson correlation coefficients

The cosine similarity for two users is computed as:

$sim(u, u^{'}) = cos(\theta) = \frac{r_u.r_{u^{'}}}{||r_u||||r_{u^{'}}} = \sum_i \frac{r_{ui}r_{u^{'}i}}{\sqrt{\sum_ir_{ui}^2}\sqrt{\sum_ir_{u^{'}i}^2}}$

Here $u$ and $u^{'}$ represents two different users.

We can predict user-u’s rating for movie-i by taking weighted sum of movie-i ratings from all other users (u′s) where weighting is similarity number between each user and user-u.

We now predict a users rating as follows:

#### $\hat{r_{ui}} = \frac{\sum_{u^{'}} sim(u, u^{'})r_{u^{'}i}}{\sum_{u^{'}} |sim(u, u^{'})|}$

Drawback:  its performance decreases when we have sparse data which hinders scalability of this approach for most of the real-world problems.

In [None]:
def fast_similarity(m_ratings, kind='user', epsilon=1e-9):
    '''
    compute the similarity
    '''
    # epsilon -> small number for handling dived-by-zero errors
    if kind == 'user':
        sim = m_ratings.dot(m_ratings.T) + epsilon
    elif kind == 'item':
        sim = m_ratings.T.dot(m_ratings) + epsilon
    norms = np.array([np.sqrt(np.diagonal(sim))])
    return (sim / norms / norms.T)

def top_k_movies(similarity, movie_idx, k=6):
    return [np.argsort(similarity[movie_idx,:])[:-k-1:-1]]

def collaborative_recommendation(all_data, df_ratings, user_id, number_recommendations):
    '''
    Recommends number of similar movies based on user item similarity
    
    @param df_ratings: rating file from MovieLens dataset
    @param userID: userID
    @param number_recommendations: number of recommendations returned as integer
    @param number_recommendations: number of recommendations returned
    
    '''
    # fit
    # ---------------------------------------------------
    # Below code creates two new columns for user id and movie id to facilitate the creation of the user item matrix
    from itertools import cycle
    n_users = df_ratings.userID.unique().shape[0]
    n_movies = df_ratings.movieID.unique().shape[0]

    l_users = cycle(list(range(n_users)))
    l_movies = list(range(n_movies))
    df_ratings['user_id'] = df_ratings['userID'].astype("int")
    df_ratings['movie_id'] = df_ratings['movieID'].astype("int")
    df_ratings['movieID'] = df_ratings['movieID'].astype("int")
    #df_ratings['movie_id2'] = df_ratings['movie_id'].astype("str")
    current_idm = 1
    current_idu = 747
    indm = 1
    indu = 1
    listMID = list(df_ratings["movieID"])
    for idx, row in df_ratings.iterrows():
        new_idm = int(df_ratings.loc[idx, 'movieID'])
        #intialize the  foudn movie id in list
        foundm = False
        for k in range(1465):
            if new_idm in listMID:
                #get the index
                lind = listMID.index(new_idm)
                #update the movie_id
                df_ratings.loc[lind, 'movie_id'] = indm
                #now set that list item to zero
                listMID[lind]=0
                foundm = True
            else:
                #break and fetch a new row
                break
        #increment the indicator
        if foundm:
            indm+=1
        #current_idm = new_idm

        #there is a bit of logic problem here...
        new_idu = int(df_ratings.loc[idx, 'userID'])
        if new_idu==current_idu:
            df_ratings.loc[idx, 'user_id'] = indu
        else:
            indu+=1
            current_idu = new_idu
            df_ratings.loc[idx, 'user_id'] = indu

    ## construct a user item matrix
    m_ratings = np.zeros((n_users, n_movies))
    for row in df_ratings.itertuples():
        #row[3] will be user rating row[4] user_id and row[5] movie_id  
        m_ratings[row[4]-1, row[5]-1] = row[3]
        
    # get item similarity matrix
    item_similarity = fast_similarity(m_ratings, kind='item')
    
    # predict
    # ---------------------------------------------------
    movies  = top_k_movies(item_similarity, user_id, number_recommendations)
    return all_data.loc[movies[0]-1, 'name']
    

In [None]:
# make recommendation with collaborative recommender
collaborative_recommendation(all_data=df, df_ratings=df_ratings, user_id=747, number_recommendations=5)

# Combined Recommender: CodebustersRecommender
Fits all three recommendation methods in fit method. When predicting, predict method assesses quality of user profile and predicts with respective method.

In [30]:
class CodebustersRecommender():
    """ 
    Implements a three stage recommendation system with a gradient boosting classifier, 
    a content-based recommender, and collaborative filter
    """
    def __init__(self):
        return None
        
    def fit(self, df, movies, df_ratings):
        """
        fits all three steps
        """
        # 0) init
        self.df = df
        self.movies = movies
        self.df_ratings = df_ratings
        
        # 1) fit gradient boosting classifier
        print("fit step 1: random forest classifier")
        self.recommendation_without_user_info(self.df)
        
        # 2) fit content-based recommender
        print("fit step 2: content-based filter")
        self.content_based_recommendation(self.movies)
        
        # 3) fit collaborative recommender
        print("fit step 3: collaborative filter")
        self.collaborative_recommendation(self.df, self.df_ratings)
        
        print("Fitting done")
        return self

    def predict(self, userID, age, gender, number_recommendations):
        """
        predicts depending on quality of user profile
        """
        
        # get userID, movie name, user age, and user gender, and user profile quality
        user_id = userID
        number_recommendations = number_recommendations
        name = self.get_user_movie(self.df, user_id)
        age = age
        gender = gender
        
        # user profile quality
        user_profile_quality = self.df[self.df.userID==userID].nunique().movieID
        
        # 1) predict with gradient boosting classifier
        # if no user ratings: needs  age, gender, number_recommendations
        if np.isnan(user_profile_quality):
            print("Predict with classifier without user info :")
            # concat user age and gender with movie information, and make predictions
            # e.g. user age 25 and male
            X = df[['age', 'gender', 'year', 'genre1', 'genre2', 'genre3']]
            # set age
            X['age'] = age
            dummyvars = pd.get_dummies(X[['gender', 'genre1', 'genre2', 'genre3']])
            # set gender
            dummyvars['gender_F'] = 0
            dummyvars['gender_M'] = 0
            if gender=='M':
                dummyvars['gender_M'] = 1
            elif gender=='F':
                dummyvars['gender_F'] = 1
            # append the dummy variables to df
            X = pd.concat([X[['age', 'year']], dummyvars], axis = 1).as_matrix()

            # make predictions
            y_pred = self.rfclf.predict(X=X)

            # concat predictions to movie information
            df_pred = pd.concat([df[['movieID', 'name']], pd.DataFrame(y_pred, index=df.index, columns=['pred_rating'])], axis = 1)
            # shuffle 5 random movies with rating 1
            df_pred = df_pred[df_pred.pred_rating==1]
            recommendation = df_pred.drop('pred_rating', axis=1).sample(number_recommendations, random_state=10).set_index('movieID')

            return recommendation
        
        # 2) predict with content-based recommender
        # if <=20 user ratings: needs userID, name, number_recommendations
        if user_profile_quality < 20:
            print("Predict with content-based filter:")
            # get name of movie user already rated
            name = self.get_user_movie(df=df, user_ID=user_id)
            # Build a 1-dimensional array with movie titles
            indices = pd.Series(self.movies.index, index=self.movies['name'])
            # Ranks movies according to similarity to requested movie
            idx = indices[name]
            sim_scores = list(enumerate(self.cosine_sim[idx]))
            sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
            sim_scores = sim_scores[1:(number_recommendations+1)]
            movie_indices = [i[0] for i in sim_scores]
            return self.movies.name.iloc[movie_indices]
        
        # 3) predict with collaborative recommender
        # if >20 user ratings: needs user_id, number_recommendations
        if user_profile_quality >= 20:
            print("Predict with collaborative filter:")
            movies  = self.top_k_movies(self.item_similarity, user_id, number_recommendations)
            return df.loc[movies[0]-1, 'name']

    
    def recommendation_without_user_info(self, df):
        '''
        function returns 5 random movies which have been recommended by a gradient boosting classifier
        without user information

        @param df: movie dataset 'allData'
        '''
        # fit
        # ---------------------------------------------------
        # User information before any movie ratings
        X = df[['age', 'gender', 'year', 'genre1', 'genre2', 'genre3']]
        y = df['rating'].as_matrix()

        # Preprocessing
        # One hot encoding
        dummyvars = pd.get_dummies(X[['gender', 'genre1', 'genre2', 'genre3']])
        # append the dummy variables to df
        X = pd.concat([X[['age', 'year']], dummyvars], axis = 1).as_matrix()

        print("GradientBoostingClassifier")
        self.rfclf = RandomForestClassifier()
        self.rfclf.fit(X=X, y=y)
        
        return self
    
    
    # find out from user input a movie that he rated
    def get_user_movie(self, df, user_ID):
        '''
        returns a rated movie from userID

        @param df: movie dataset 'allData'
        @param user_id: target user_ID
        '''
        # return data from random sampled row of user
        df_liked = df[df.rating==1]
        movie = df[df['userID']==747].sample(1).name

        # strip space at the end before return
        return movie.item().rstrip()

    # Function that get movie recommendations based on the cosine similarity score of movie genres
    def content_based_recommendation(self, movies):
        '''
        Recommends number of similar movie based on movie title and similarity to movies in movie database

        @param movies: pandas dataframe with movie dataset with columns (movieID, name, genres_concat)
        @param name: movie title as string
        @param number_recommendations: number of recommendations returned as integer
        '''
        # fit
        # ---------------------------------------------------
        # Preprocessing for tf-idf vectorization
        # Strip space at the end of string
        movies['name'] = movies['name'].str.rstrip()
        # Concat genres into one string
        movies['genres_concat'] = movies[['genre1', 'genre2', 'genre3']].astype(str).apply(' '.join, axis=1)
        # Remove nans in string and strip spaces at the end
        movies['genres_concat'] = movies['genres_concat'].str.replace('nan','').str.rstrip()

        # Create tf_idf matrix sklearn TfidfVectorizer
        tf = TfidfVectorizer(analyzer='word',ngram_range=(1, 2),min_df=0, stop_words='english')
        tfidf_matrix = tf.fit_transform(movies['genres_concat'])

        # calculate similarity matrix with cosine distance of tf_idf values
        self.cosine_sim = linear_kernel(tfidf_matrix, tfidf_matrix)


    def fast_similarity(self, m_ratings, kind='user', epsilon=1e-9):
        '''
        compute the similarity
        '''
        # epsilon -> small number for handling dived-by-zero errors
        if kind == 'user':
            sim = m_ratings.dot(m_ratings.T) + epsilon
        elif kind == 'item':
            sim = m_ratings.T.dot(m_ratings) + epsilon
        norms = np.array([np.sqrt(np.diagonal(sim))])
        return (sim / norms / norms.T)

    def top_k_movies(self, similarity, movie_idx, k=6):
        return [np.argsort(similarity[movie_idx,:])[:-k-1:-1]]

    def collaborative_recommendation(self, all_data, df_ratings):
        '''
        Recommends number of similar movies based on user item similarity

        @param df_ratings: rating file from MovieLens dataset
        @param userID: userID
        @param number_recommendations: number of recommendations returned as integer
        @param number_recommendations: number of recommendations returned

        '''
        # fit
        # ---------------------------------------------------
        # Below code creates two new columns for user id and movie id to facilitate the creation of the user item matrix
        from itertools import cycle
        n_users = df_ratings.userID.unique().shape[0]
        n_movies = df_ratings.movieID.unique().shape[0]

        l_users = cycle(list(range(n_users)))
        l_movies = list(range(n_movies))
        df_ratings['user_id'] = df_ratings['userID'].astype("int")
        df_ratings['movie_id'] = df_ratings['movieID'].astype("int")
        df_ratings['movieID'] = df_ratings['movieID'].astype("int")
        #df_ratings['movie_id2'] = df_ratings['movie_id'].astype("str")
        current_idm = 1
        current_idu = 747
        indm = 1
        indu = 1
        listMID = list(df_ratings["movieID"])
        for idx, row in df_ratings.iterrows():
            new_idm = int(df_ratings.loc[idx, 'movieID'])
            #intialize the  foudn movie id in list
            foundm = False
            for k in range(1465):
                if new_idm in listMID:
                    #get the index
                    lind = listMID.index(new_idm)
                    #update the movie_id
                    df_ratings.loc[lind, 'movie_id'] = indm
                    #now set that list item to zero
                    listMID[lind]=0
                    foundm = True
                else:
                    #break and fetch a new row
                    break
            #increment the indicator
            if foundm:
                indm+=1
            #current_idm = new_idm

            #there is a bit of logic problem here...
            new_idu = int(df_ratings.loc[idx, 'userID'])
            if new_idu==current_idu:
                df_ratings.loc[idx, 'user_id'] = indu
            else:
                indu+=1
                current_idu = new_idu
                df_ratings.loc[idx, 'user_id'] = indu

        ## construct a user item matrix
        m_ratings = np.zeros((n_users, n_movies))
        for row in df_ratings.itertuples():
            #row[3] will be user rating row[4] user_id and row[5] movie_id  
            m_ratings[row[4]-1, row[5]-1] = row[3]

        # get item similarity matrix
        self.item_similarity = self.fast_similarity(m_ratings, kind='item')


##### Init object and fit recommender systems

In [31]:
# import data
df = pd.read_csv("allData.tsv", sep='\t')
movies = pd.read_csv("movies.tsv", sep='\t')
df_ratings = pd.read_csv('ratings.csv')

# init class object
CbR = CodebustersRecommender()
# fit all recommender systems
CbR.fit(df, movies, df_ratings)

fit step 1: random forest classifier
GradientBoostingClassifier
fit step 2: content-based filter
fit step 3: collaborative filter
Fitting done


<__main__.CodebustersRecommender at 0x10eaa09b0>

##### Predict for different user cases

In [33]:
# predict
# case no profile
user_id = 1 
age=25
gender='F'
#user_id = 747 # case profile < 20
#user_id = 802 # case profile >= 20
#age = df[df.userID==user_id].sample(1).age.item()
#gender = df[df.userID==user_id].sample(1).age.item()

# predict for different user ids
movies = CbR.predict(userID=user_id, age=age, gender=gender, number_recommendations=5)
movies

Predict with classifier without user info :


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0_level_0,name
movieID,Unnamed: 1_level_1
2642,Superman III
2287,Them!
230,Dolores Claiborne
762,Striptease
762,Striptease


## Evaluation

##### Problems with evaluation
- Evaluation with a metric for content-based and collaborative approach is hardly possible, because a lot of recommended movies haven't been rated by the initial user yet!
    - 1) Simple Classifier without user info
        - Precision / Recall evaluation
    - 2) Content-based recommendation
        - Precision / Recall hardly possible, instead just recommend movies
    - 3) Collaborative recommendation
        - Precision / Recall hardly possible, MSE possible, instead just recommend movies
        
##### Possible solutions
- Way larger datasets which are not as sparse

- Evaluate user's reaction after recommendation is given (e.g. in Netflix, if he likes the movie)
