# Intro to Machine Learning: Recommender - Item-Based Collaborative Filtering

## Outline

* [Loading Data](#Loading-Data)
* [Creating a Pivot Table](#Creating-a-Pivot-Table)
* [Measuring Similarity](#Measuring-Similarity)
* [Implementing a Recommender](#Implementing-a-Recommender)

## Loading Data

In [None]:
import pandas as pd

In [None]:
wine_ratings = pd.read_csv('data/wine-reviews.csv')

In [None]:
wine_ratings.head()

---

## Creating a Pivot Table

In [None]:
wine_ratings_pivoted = wine_ratings.pivot(index='username', columns='wine', values='rating').fillna(0)
wine_ratings_pivoted

---

## Measuring Similarity

In [None]:
from sklearn.metrics.pairwise import cosine_similarity

In [None]:
cosine_similarity([wine_ratings_pivoted['La Bota de Amontillado 1']], [wine_ratings_pivoted['Manzanilla La Gitana']])

In [None]:
wine_ratings_pivoted.columns

Find a simliarity vector for "Le Grappin Bagnum Rose 2013" to see the similarity between this one and the others.

In [None]:
similarity_vector = {}
for each in wine_ratings_pivoted.columns:
    similarity = cosine_similarity(
        [wine_ratings_pivoted[each]], 
        [wine_ratings_pivoted['Le Grappin Bagnum Rose 2013']]
    )[0][0]
    similarity_vector[each] = similarity

In [None]:
similarity_vector

Sort wine based on the similarity.

In [None]:
def get_key(item):
    return item.key

sorted(similarity_vector, key=similarity_vector.get, reverse=True)

---

## Implementing a Recommender

Pick Mari and see if we should recommend "Le Grappin Bagnum Rose 2013" to her?

In [None]:
wine_ratings_pivoted.loc['mari', :]

In [None]:
ratings = wine_ratings_pivoted.loc[
    'mari', [
        'Manzanilla La Gitana', 
        'Molino Real 2002', 
        'Pol Roger Rose 1998', 
        'Raveneu Le Clos 1996'
    ]
]

In [None]:
ratings

Calculate the weighted mean of data

$\{x_1, x_2, \dots , x_n\},$

where x represents a set of mean values with non-negative weights

$\bar{x} = \frac{ \sum\limits_{i=1}^n w_i x_i}{\sum\limits_{i=1}^n w_i}$

In [None]:
numerator = ((ratings['Manzanilla La Gitana'] * similarity_vector['Manzanilla La Gitana']) + 
             (ratings['Molino Real 2002'] * similarity_vector['Molino Real 2002']) + 
             (ratings['Pol Roger Rose 1998'] * similarity_vector['Pol Roger Rose 1998']) + 
             (ratings['Raveneu Le Clos 1996'] * similarity_vector['Raveneu Le Clos 1996']))
denominator = (similarity_vector['Manzanilla La Gitana'] + 
               similarity_vector['Molino Real 2002'] + 
               similarity_vector['Pol Roger Rose 1998'] + 
               similarity_vector['Raveneu Le Clos 1996'])

In [None]:
weighted_mean = numerator / denominator

In [None]:
weighted_mean