## Ranking and recommender systems

* Ranking: order things relative to each other
* Recommendation: recommend approximate values

**Learning to rank** or machine-learned ranking (MLR) is the application of machine learning, typically supervised, semi-supervised or reinforcement learning, in the construction of ranking models for information retrieval systems. 

Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item.

The ranking model's purpose is to rank, i.e. produce a permutation of items in new, unseen lists in a way which is "similar" to rankings in the training data in some sense. 

## Ranking is used in...
 * In machine translation
 * In computational biology 
 * In recommender systems for identifying a ranked list of related items
 * In software engineering

## Similarity measures

* Euclidean distance
* Cosine similarity

In [1]:
def cosine_similarity(x, y):
    return x.dot(y) / (np.linalg.norm(x) * np.linalg.norm(y))

## Recommending books

Given a list of users who reads a number of books, let's create a recommendation for the users.

In [2]:
users_interests = [
    ["Hadoop", "Big Data", "HBase", "Java", "Spark", "Storm", "Cassandra"],
    ["NoSQL", "MongoDB", "Cassandra", "HBase", "Postgres"],
    ["Python", "scikit-learn", "scipy", "numpy", "statsmodels", "pandas"],
    ["R", "Python", "statistics", "regression", "probability"],
    ["machine learning", "regression", "decision trees", "libsvm"],
    ["Python", "R", "Java", "C++", "Haskell", "programming languages"],
    ["statistics", "probability", "mathematics", "theory"],
    ["machine learning", "scikit-learn", "Mahout", "neural networks"],
    ["neural networks", "deep learning", "Big Data", "artificial intelligence"],
    ["Hadoop", "Java", "MapReduce", "Big Data"],
    ["statistics", "R", "statsmodels"],
    ["C++", "deep learning", "artificial intelligence", "probability"],
    ["pandas", "R", "Python"],
    ["databases", "HBase", "Postgres", "MySQL", "MongoDB"],
    ["libsvm", "regression", "support vector machines"]
]

## Task 1: Recommend popular books

In [6]:
%pylab inline

Populating the interactive namespace from numpy and matplotlib


In [49]:
from collections import defaultdict

counted = defaultdict(int)
for book in flatten(users_interests):
    counted[book] += 1
    
def recommend_book(user):
    return sorted(counted.items(), key=lambda kv: kv[1])[-1]

('R', 4)

## Task 2: Recommend similar users

In [108]:
from sklearn.feature_extraction.text import CountVectorizer
model = CountVectorizer().fit(flatten(users_interests))
users = model.transform([str(x) for x in users_interests])

mock_user = ["Big Data", 'statisticts']

def recommend_similar(input_user):
    input_vector = model.transform([str(input_user)])
    similarities = []
    for user in users:
        similarities.append(cosine_similarity(input_vector.toarray()[0], user.toarray()[0]))
    return similarities
        
recommend_similar(mock_user)

[0.4999999999999999,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.4999999999999999,
 0.6324555320336759,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0]

## Exercise: Song recommendation

Given a list of songs and their features, we wish to recommend a song that is similar to a song currently played.
The goal is to write a function that takes a song ID as its input and provides a new suggested song as its output.

As the data input you can either use this preprocessed dataset (https://www.kaggle.com/geomack/spotifyclassification) *or* use your own Spotify account data! See how here (https://medium.com/deep-learning-turkey/build-your-own-spotify-playlist-of-best-playlist-recommendations-fc9ebe92826a) and here (https://towardsdatascience.com/making-your-own-discover-weekly-f1ac7546fedb)
