# Movie Recommendation Engine using LightFM

View LightFM documentation <a href="https://making.lyst.com/lightfm/docs/home.html">here</a>

In [1]:
# Install lightFM, takes around 15 seconds
!pip install lightfm

Collecting lightfm
  Downloading lightfm-1.16.tar.gz (310 kB)
[?25l[K     |█                               | 10 kB 17.7 MB/s eta 0:00:01[K     |██▏                             | 20 kB 22.8 MB/s eta 0:00:01[K     |███▏                            | 30 kB 23.7 MB/s eta 0:00:01[K     |████▎                           | 40 kB 23.2 MB/s eta 0:00:01[K     |█████▎                          | 51 kB 23.4 MB/s eta 0:00:01[K     |██████▍                         | 61 kB 24.9 MB/s eta 0:00:01[K     |███████▍                        | 71 kB 24.3 MB/s eta 0:00:01[K     |████████▌                       | 81 kB 25.4 MB/s eta 0:00:01[K     |█████████▌                      | 92 kB 25.8 MB/s eta 0:00:01[K     |██████████▋                     | 102 kB 25.7 MB/s eta 0:00:01[K     |███████████▋                    | 112 kB 25.7 MB/s eta 0:00:01[K     |████████████▊                   | 122 kB 25.7 MB/s eta 0:00:01[K     |█████████████▊                  | 133 kB 25.7 MB/s eta 0:00:01[K 

### Get dataset from <i>movielens</i> which consist of around 950 users, 1700 movies and 100,000 ratings. </br>
The ratings are on a scale from 1 to 5.

In [2]:
# Import libraries
import numpy as np
from lightfm.datasets import fetch_movielens
from lightfm import LightFM

In [3]:
# Fetching movies with min rating of 4 and over
data = fetch_movielens(min_rating = 4.0)
data

{'item_feature_labels': array(['Toy Story (1995)', 'GoldenEye (1995)', 'Four Rooms (1995)', ...,
        'Sliding Doors (1998)', 'You So Crazy (1994)',
        'Scream of Stone (Schrei aus Stein) (1991)'], dtype=object),
 'item_features': <1682x1682 sparse matrix of type '<class 'numpy.float32'>'
 	with 1682 stored elements in Compressed Sparse Row format>,
 'item_labels': array(['Toy Story (1995)', 'GoldenEye (1995)', 'Four Rooms (1995)', ...,
        'Sliding Doors (1998)', 'You So Crazy (1994)',
        'Scream of Stone (Schrei aus Stein) (1991)'], dtype=object),
 'test': <943x1682 sparse matrix of type '<class 'numpy.int32'>'
 	with 5469 stored elements in COOrdinate format>,
 'train': <943x1682 sparse matrix of type '<class 'numpy.int32'>'
 	with 49906 stored elements in COOrdinate format>}

In [4]:
# Printing key and value from the dataset
for key, value in data.items():
    print(key, type(value), value.shape)

train <class 'scipy.sparse.coo.coo_matrix'> (943, 1682)
test <class 'scipy.sparse.coo.coo_matrix'> (943, 1682)
item_features <class 'scipy.sparse.csr.csr_matrix'> (1682, 1682)
item_feature_labels <class 'numpy.ndarray'> (1682,)
item_labels <class 'numpy.ndarray'> (1682,)


### Create model and train the model

<u>WARP (Weighted Approximate-Rank Pairwise loss) model</u>

Maximises the rank of positive examples by repeatedly sampling negative examples until rank violating one is found. Useful when only positive interactions are present and optimising the top of the recommendation list (precision@k) is desired.

In [5]:
model = LightFM(loss = 'warp')

In [6]:
# Extract train and test datasets
train = data['train']
test = data['test']

In [7]:
# Fitting the model over 10 epochs
model.fit(train, epochs=10)

<lightfm.lightfm.LightFM at 0x7f370e306810>

### Performance Evaluation

<b>precision</b>: the fraction of known positives in the first k positions of the ranked list of results. A perfect score is 1.0.

<b>AUC</b>: the probability that a randomly chosen positive example has a higher score than a randomly chosen negative example. A perfect score is 1.0.



In [8]:
from lightfm.evaluation import precision_at_k
from lightfm.evaluation import auc_score

train_precision = precision_at_k(model, train, k=10).mean()
test_precision = precision_at_k(model, test, k=10).mean()

train_auc = auc_score(model, train).mean()
test_auc = auc_score(model, test).mean()

print('Precision: train %.2f, test %.2f.' % (train_precision, test_precision))
print('AUC: train %.2f, test %.2f.' % (train_auc, test_auc))

Precision: train 0.48, test 0.08.
AUC: train 0.94, test 0.91.


### Recommend movies for some users

<a href="https://towardsdatascience.com/how-to-build-a-movie-recommender-system-in-python-using-lightfm-8fa49d7cbe3b">sample_recommendation</a> function credit go to Arun Mathew Kurian

In [9]:
def sample_recommendation(model, data, user_ids):
    '''uses model, data and a list of users ideas and outputs the recommended movies along with known positives for each user'''
    n_users, n_items = data['train'].shape
    for user_id in user_ids:
        known_positives = data['item_labels'][data['train'].tocsr()[user_id].indices]
        
        scores = model.predict(user_id, np.arange(n_items))

        top_items = data['item_labels'][np.argsort(-scores)]
      
        print("User %s" % user_id)
        print("Known positives:")
        
        # Print the first 3 known positives
        for x in known_positives[:3]:
            print("%s" % x)
        
        # Print the first 3 recommended movies
        print("Recommended:")
        for x in top_items[:3]:
            print("%s" % x)
        print("\n")

In [10]:
# Testing on users 3, 95 and 125
sample_recommendation(model, data, [3, 95, 125])

User 3
Known positives:
Seven (Se7en) (1995)
Contact (1997)
Starship Troopers (1997)
Recommended:
Contact (1997)
Scream (1996)
Air Force One (1997)


User 95
Known positives:
Toy Story (1995)
Twelve Monkeys (1995)
Taxi Driver (1976)
Recommended:
Star Wars (1977)
Raiders of the Lost Ark (1981)
Silence of the Lambs, The (1991)


User 125
Known positives:
Jungle2Jungle (1997)
Kull the Conqueror (1997)
Scream (1996)
Recommended:
Game, The (1997)
Conspiracy Theory (1997)
Air Force One (1997)


