MovieLens Rating Prediction Workshop Notebook

This notebook runs faster on a GPU runtime. To enable it, go to Edit > Notebook Settings > Hardware Accelerator > GPU.


## Setup

In [1]:
import torch

print(torch.__version__)

2.4.1


## Link Regression on the MovieLens Dataset

This notebook shows how to load a set of `*.csv` files into a `torch_geometric.data.HeteroData` object and how to train a [heterogeneous graph model](https://pytorch-geometric.readthedocs.io/en/latest/notes/heterogeneous.html#hgtutorial).

We are going to use the [Movielens dataset](https://grouplens.org/datasets/movielens/), which is collected by the GroupLens Research group. The toy dataset describes movies, users, and their ratings. We are going to predict the rating of a user for a movie.

## Data Ingestion

In [11]:
import pandas as pd

dataset_name = 'itstore'

orders_path = f'data/{dataset_name}/final_orders.csv'
prod_path = f'data/{dataset_name}/final_products.csv'
cus_path = f'data/{dataset_name}/final_customers.csv'

print(f'Loading orders from {orders_path}')
print(f'Loading customers from {cus_path}')
print(f'Loading products from {prod_path}')

Loading orders from data/itstore/final_orders.csv
Loading customers from data/itstore/final_customers.csv
Loading products from data/itstore/final_products.csv


In [12]:
import pandas as pd
# Load the entire ratings dataframe into memory:
#orders_df = pd.read_csv(orders_path)[["user_id", "item_id"]]

orders_df = pd.read_csv(orders_path, header=0, sep=' ')[["user_id", "item_id"]]
prod_df = pd.read_csv(prod_path, header=0, sep=',')
cus_df = pd.read_csv(cus_path, header=0, sep=',')[["user_id", "user_name", "user_cat", "sf_cat"]]

# Display the entire row in one line
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', None)

print('orders.csv:')
print('============')
print(orders_df[["user_id", "item_id"]].head())
print(f"Number of ratings: {len(orders_df)}")
print()
print('final_customers.csv:')
print('============')

print(cus_df.head())
print(f"Number of customers: {len(cus_df)}")
print()
print('final_products.csv:')
print('============')
print(prod_df.head())
print(f"Number of products: {len(prod_df)}")
print()

orders.csv:
   user_id         item_id
0  6032699  40101002000100
1  6032699  40101002000100
2  5769655  70100001001400
3  6508081  70100001001400
4  6508081  70100001001400
Number of ratings: 229832

final_customers.csv:
   user_id                                                 user_name  \
0       47  "Монгол улсын төлөөллийн байгууллагыг бэжүүлэх нь" төсөл   
1  9927654                       50-н ортой 3-н цэцэрлэг барих төсөл   
2      277              Beijing Oriental Shine International Trading   
3  2154565                                   Canon Singapore Pte Ltd   
4      111                                     Dell Asia Pacific Sdn   

   user_cat    sf_cat  
0  Entities  6. Төсөл  
1  Entities  6. Төсөл  
2  Entities    7. ОУБ  
3  Entities    7. ОУБ  
4  Entities    7. ОУБ  
Number of customers: 7453

final_products.csv:
       item_id  \
0            9   
1           14   
2           14   
3         3208   
4  80230000002   

                                             

Additionally, let's add our ratings to the dataset to get predictions for movies we haven't seen yet.

There are two ways to add ratings:
1. **Add ratings manually**
2. **Upload IMDB ratings**


### Add your ratings manually


We recommend adding at least 10 ratings. Let's first check out the most rated movies. Additional movies in the table are: *Avatar*, *The Dark Knight*, *Pretty Women*,
*Titanic*, *The Lion King*, *Jurassic Park*, *The Matrix*, *The Lord of the Rings* and *The Avengers*. Please note that the article in the movie title is often at the end of the title.

In [36]:
#from fuzzywuzzy import fuzz


# print('Most purchased item:')
# print('==================')
# most_purchased_items = orders_df['item_id'].value_counts().head(10)
# print(most_purchased_items)
# print(prod_df.loc[most_purchased_items.index][["item_name"]])

# # Initialize your rating list
# ratings = []

print('Most Purchased Items:')
print('=====================')

# Get the top 10 most purchased item IDs and their counts as a DataFrame
most_purchased_items = orders_df['item_id'].value_counts().head(10).reset_index()
most_purchased_items.columns = ['item_id', 'purchase_count']

# Merge with prod_df to get item names
most_purchased_items_df = most_purchased_items.merge(prod_df[['item_id', 'item_name']], on='item_id', how='left')

# Display the result
print(most_purchased_items_df[['item_name', 'purchase_count']])


Most Purchased Items:
                                            item_name  purchase_count
0                            itstore стандарт хүргэлт           13445
1   Service Software and Hardware  Оношлох 2520001...            4198
2   Software ESET NOD32 Antivirus 8 New 1yr, 1-uni...            3084
3                                                 NaN            3084
4   Mouse: Dell Optical Wireless Mouse WM126 - Bla...            1980
5                                                 NaN            1980
6   Printers Supply Ink Cartridge: Canon ink bottl...            1596
7   Mouse: Dell Optical Wired Mouse, MS116 USB, En...            1539
8   Service of Printer Ажиллагаа ихтэй гэмтэл засв...            1491
9    Printers Supply Ink Cartridge: Canon ink bott...            1458
10  Printers Supply Ink Cartridge: Canon ink bottl...            1436
11  Printers Supply Ink Cartridge: Canon ink bottl...            1425


In [37]:
print('Top Customers:')
print('=====================')

# Get the top 10 most purchased item IDs and their counts as a DataFrame
top_customers = orders_df['user_id'].value_counts().head(10).reset_index()
top_customers.columns = ['user_id', 'purchase_count']

# Merge with prod_df to get item names
top_customers_df = top_customers.merge(cus_df[['user_id', 'user_name']], on='user_id', how='left')

# Display the result
print(top_customers_df[['user_name', 'purchase_count']])

Top Customers:
                user_name  purchase_count
0                Хувь хүн           41544
1               Оюутолгой            9297
2           Дижитал повер            5733
3          Премиум нэксус            4755
4                 Хасбанк            3978
5            Могул сервис            3243
6                Киберком            2991
7       Нэкст-Электроникс            2809
8  Худалдаа хөгжлийн банк            1827
9          Могул экспресс            1670


## Data Preprocessing

We are going to use the genre as well as the title of the movie as node features. For the `title` features, we are going to use a pre-trained [sentence transformer](https://www.sbert.net/) model to encode the title into a vector.
For the `genre` features, we are going to use a one-hot encoding.

In [38]:
import numpy as np
import torch
from sentence_transformers import SentenceTransformer

# --------------------- user features ---------------------

# Create one-hot encoding for 'user_cat' and 'sf_cat'
u_cat_1h = pd.get_dummies(cus_df['user_cat'], prefix='user_cat')
u_sf_cat_1h = pd.get_dummies(cus_df['sf_cat'], prefix='sf_cat')

cus_df = pd.concat([cus_df, u_cat_1h, u_sf_cat_1h], axis=1)

# --------------------- item features ---------------------
# Create one-hot encoding for 'item_category' and 'item_subcategory'
category_one_hot = pd.get_dummies(prod_df['item_category'], prefix='category')
#subcategory_one_hot = pd.get_dummies(prod_df['item_subcategory'], prefix='subcategory')

# Concatenate the one-hot encoded DataFrames with the original prod_df
#prod_df = pd.concat([prod_df, category_one_hot, subcategory_one_hot], axis=1)
prod_df = pd.concat([prod_df, category_one_hot], axis=1)

# Drop the original columns if you no longer need them
prod_df = prod_df.drop(['item_category', 'item_subcategory'], axis=1)

# Display the result
#print(prod_df.head())

# # Load the pre-trained sentence transformer model and encode the movie titles:
model = SentenceTransformer('all-MiniLM-L6-v2')
with torch.no_grad():
     item_name = model.encode(prod_df['item_name'].tolist(), convert_to_tensor=True, show_progress_bar=True)
     item_name = item_name.cpu()

# # Concatenate the genres and title features:
prod_features = torch.cat([category_one_hot, item_name], dim=-1)

# # We don't have user features, which is why we use an identity matrix
# #user_features = torch.eye(len(ratings_df['userId'].unique()))


Batches: 100%|█████████▉| 622/625 [00:16<00:00, 37.50it/s]


TypeError: 'float' object is not subscriptable

The `ratings.csv` file contains the ratings of users for movies. From this
file we are extracting the `userId`. We create a mapping from the `userId`
to a unique consecutive value in the range `[0, num_users]`. This is needed as we want our final data representation to be as compact as possible, *e.g.*, the representation of a user in the first row should be accessible via `x[0]`.
The same we do for the `movieId`.
Afterwards, we obtain the final `edge_index` representation of shape `[2, num_ratings]` from `ratings.csv` by merging mapped user and movie indices with the raw indices given by the original data frame.


In [8]:
# Create a mapping from the userId to a unique consecutive value in the range [0, num_users]:
unique_user_id = ratings_df['userId'].unique()
unique_user_id = pd.DataFrame(data={
    'userId': unique_user_id,
    'mappedUserId': pd.RangeIndex(len(unique_user_id))
    })
print("Mapping of user IDs to consecutive values:")
print("==========================================")
print(unique_user_id.head())
print()

# Create a mapping from the movieId to a unique consecutive value in the range [0, num_movies]:
unique_movie_id = ratings_df['movieId'].unique()
unique_movie_id = pd.DataFrame(data={
    'movieId': unique_movie_id,
    'mappedMovieId': pd.RangeIndex(len(unique_movie_id))
    })
print("Mapping of movie IDs to consecutive values:")
print("===========================================")
print(unique_movie_id.head())
print()

# Merge the mappings with the original data frame:
ratings_df = ratings_df.merge(unique_user_id, on='userId')
ratings_df = ratings_df.merge(unique_movie_id, on='movieId')

# With this, we are ready to create the edge_index representation in COO format
# following the PyTorch Geometric semantics:
edge_index = torch.stack([
    torch.tensor(ratings_df['mappedUserId'].values),
    torch.tensor(ratings_df['mappedMovieId'].values)]
    , dim=0)

assert edge_index.shape == (2, len(ratings_df))

print("Final edge indices pointing from users to movies:")
print("================================================")
print(edge_index[:, :10])

Mapping of user IDs to consecutive values:
   userId  mappedUserId
0       1             0
1       2             1
2       3             2
3       4             3
4       5             4

Mapping of movie IDs to consecutive values:
   movieId  mappedMovieId
0        1              0
1        3              1
2        6              2
3       47              3
4       50              4

Final edge indices pointing from users to movies:
tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])


## Heterogeneous Graph Construction

With this we are ready to initialize our heterogeneous graph data object and pass the
necessary information to it.

We also take care of adding reverse edges to the `HeteroData` object. This allows our GNN
model to use both directions of the edges for the message passing.

In [9]:
import torch_geometric.transforms as T
from torch_geometric.data import HeteroData

# Create the heterogeneous graph data object:
data = HeteroData()

# Add the user nodes:
data['user'].x = user_features  # [num_users, num_features_users]

# Add the movie nodes:
data['movie'].x = movie_features  # [num_movies, num_features_movies]

# Add the rating edges:
data['user', 'rates', 'movie'].edge_index = edge_index  # [2, num_ratings]

# Add the rating labels:
rating = torch.from_numpy(ratings_df['rating'].values).to(torch.float)
data['user', 'rates', 'movie'].edge_label = rating  # [num_ratings]

# We also need to make sure to add the reverse edges from movies to users
# in order to let a GNN be able to pass messages in both directions.
# We can leverage the `T.ToUndirected()` transform for this from PyG:
data = T.ToUndirected()(data)

# With the above transformation we also got reversed labels for the edges.
# We are going to remove them:
del data['movie', 'rev_rates', 'user'].edge_label

assert data['user'].num_nodes == len(unique_user_id)
assert data['user', 'rates', 'movie'].num_edges == len(ratings_df)
assert data['movie'].num_features == 404

data

HeteroData(
  user={ x=[611, 611] },
  movie={ x=[9742, 404] },
  (user, rates, movie)={
    edge_index=[2, 100846],
    edge_label=[100846],
  },
  (movie, rev_rates, user)={ edge_index=[2, 100846] }
)

## Dataset Splitting

We can now split our data into a training, validation and test set. We are going to use
the `T.RandomLinkSplit` transform from PyG to do this. This transform will randomly
split the links with their label/rating into training, validation and test set.
We are going to use 80% of the edges for training, 10% for validation and 10% for testing.

In [10]:
train_data, val_data, test_data = T.RandomLinkSplit(
    num_val=0.1,
    num_test=0.1,
    neg_sampling_ratio=0.0,
    edge_types=[('user', 'rates', 'movie')],
    rev_edge_types=[('movie', 'rev_rates', 'user')],
)(data)
train_data, val_data

(HeteroData(
   user={ x=[611, 611] },
   movie={ x=[9742, 404] },
   (user, rates, movie)={
     edge_index=[2, 80678],
     edge_label=[80678],
     edge_label_index=[2, 80678],
   },
   (movie, rev_rates, user)={ edge_index=[2, 80678] }
 ),
 HeteroData(
   user={ x=[611, 611] },
   movie={ x=[9742, 404] },
   (user, rates, movie)={
     edge_index=[2, 80678],
     edge_label=[10084],
     edge_label_index=[2, 10084],
   },
   (movie, rev_rates, user)={ edge_index=[2, 80678] }
 ))

## Graph Neural Network

We are now ready to define our GNN model. We are going to use a simple GNN model with
two message passing layers for the encoding of the user and movie nodes.
Additionally, we are going to use a decoder to predict the rating for the encoded
user-movie combination.

In [11]:
from torch_geometric.nn import SAGEConv, to_hetero

class GNNEncoder(torch.nn.Module):
    def __init__(self, hidden_channels, out_channels):
        super().__init__()
        self.conv1 = SAGEConv((-1, -1), hidden_channels)
        self.conv2 = SAGEConv((-1, -1), out_channels)

    def forward(self, x, edge_index):
        x = self.conv1(x, edge_index).relu()
        x = self.conv2(x, edge_index)
        return x


class EdgeDecoder(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        self.lin1 = torch.nn.Linear(2 * hidden_channels, hidden_channels)
        self.lin2 = torch.nn.Linear(hidden_channels, 1)

    def forward(self, z_dict, edge_label_index):
        row, col = edge_label_index
        z = torch.cat([z_dict['user'][row], z_dict['movie'][col]], dim=-1)

        z = self.lin1(z).relu()
        z = self.lin2(z)
        return z.view(-1)


class Model(torch.nn.Module):
    def __init__(self, hidden_channels):
        super().__init__()
        self.encoder = GNNEncoder(hidden_channels, hidden_channels)
        self.encoder = to_hetero(self.encoder, data.metadata(), aggr='sum')
        self.decoder = EdgeDecoder(hidden_channels)

    def forward(self, x_dict, edge_index_dict, edge_label_index):
        z_dict = self.encoder(x_dict, edge_index_dict)
        return self.decoder(z_dict, edge_label_index)

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

model = Model(hidden_channels=32).to(device)

print(model)

Model(
  (encoder): GraphModule(
    (conv1): ModuleDict(
      (user__rates__movie): SAGEConv((-1, -1), 32, aggr=mean)
      (movie__rev_rates__user): SAGEConv((-1, -1), 32, aggr=mean)
    )
    (conv2): ModuleDict(
      (user__rates__movie): SAGEConv((-1, -1), 32, aggr=mean)
      (movie__rev_rates__user): SAGEConv((-1, -1), 32, aggr=mean)
    )
  )
  (decoder): EdgeDecoder(
    (lin1): Linear(in_features=64, out_features=32, bias=True)
    (lin2): Linear(in_features=32, out_features=1, bias=True)
  )
)


## Training a Heterogeneous GNN

Training our GNN is then similar to training any PyTorch model.
We move the model to the desired device, and initialize an optimizer that takes care of adjusting model parameters via stochastic gradient descent.

The training loop applies the forward computation of the model, computes the loss from ground-truth labels and obtained predictions, and adjusts model parameters via back-propagation and stochastic gradient descent.


In [12]:
import torch.nn.functional as F

optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

def train():
    model.train()
    optimizer.zero_grad()
    pred = model(train_data.x_dict, train_data.edge_index_dict,
                 train_data['user', 'movie'].edge_label_index)
    target = train_data['user', 'movie'].edge_label
    loss = F.mse_loss(pred, target)
    loss.backward()
    optimizer.step()
    return float(loss)

@torch.no_grad()
def test(data):
    data = data.to(device)
    model.eval()
    pred = model(data.x_dict, data.edge_index_dict,
                 data['user', 'movie'].edge_label_index)
    pred = pred.clamp(min=0, max=5)
    target = data['user', 'movie'].edge_label.float()
    rmse = F.mse_loss(pred, target).sqrt()
    return float(rmse)


for epoch in range(1, 301):
    train_data = train_data.to(device)
    loss = train()
    train_rmse = test(train_data)
    val_rmse = test(val_data)
    print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}, Train: {train_rmse:.4f}, '
          f'Val: {val_rmse:.4f}')

Epoch: 001, Loss: 13.4003, Train: 3.4097, Val: 3.4051
Epoch: 002, Loss: 11.6261, Train: 2.9774, Val: 2.9779
Epoch: 003, Loss: 8.8647, Train: 2.1184, Val: 2.1300
Epoch: 004, Loss: 4.4877, Train: 1.0422, Val: 1.0615
Epoch: 005, Loss: 1.0862, Train: 1.8239, Val: 1.8111
Epoch: 006, Loss: 7.0819, Train: 1.7247, Val: 1.7123
Epoch: 007, Loss: 3.2202, Train: 1.0455, Val: 1.0626
Epoch: 008, Loss: 1.0931, Train: 1.3027, Val: 1.3270
Epoch: 009, Loss: 1.6970, Train: 1.6534, Val: 1.6728
Epoch: 010, Loss: 2.7336, Train: 1.8095, Val: 1.8267
Epoch: 011, Loss: 3.2744, Train: 1.7787, Val: 1.7971
Epoch: 012, Loss: 3.1636, Train: 1.6061, Val: 1.6277
Epoch: 013, Loss: 2.5794, Train: 1.3308, Val: 1.3574
Epoch: 014, Loss: 1.7709, Train: 1.0663, Val: 1.0955
Epoch: 015, Loss: 1.1370, Train: 1.0755, Val: 1.0921
Epoch: 016, Loss: 1.1567, Train: 1.3398, Val: 1.3397
Epoch: 017, Loss: 1.7952, Train: 1.4351, Val: 1.4311
Epoch: 018, Loss: 2.0607, Train: 1.2706, Val: 1.2743
Epoch: 019, Loss: 1.6143, Train: 1.0605, Val

## Evaluation

From the validation results, our model can generalize well to unseen data. The val RMSE is should be around 0.9, meaning that, on average our model is off by 0.9 stars. We can now evaluate our model on the test set and take a closer look into the predictions.

In [13]:
with torch.no_grad():
    test_data = test_data.to(device)
    pred = model(test_data.x_dict, test_data.edge_index_dict,
                 test_data['user', 'movie'].edge_label_index)
    pred = pred.clamp(min=0, max=5)
    target = test_data['user', 'movie'].edge_label.float()
    rmse = F.mse_loss(pred, target).sqrt()
    print(f'Test RMSE: {rmse:.4f}')

userId = test_data['user', 'movie'].edge_label_index[0].cpu().numpy()
movieId = test_data['user', 'movie'].edge_label_index[1].cpu().numpy()
pred = pred.cpu().numpy()
target = target.cpu().numpy()

print(pd.DataFrame({'userId': userId, 'movieId': movieId, 'rating': pred, 'target': target}))

Test RMSE: 0.9077
       userId  movieId    rating  target
0         413     1932  3.629247     4.0
1         596      133  3.744290     3.0
2         138     3072  2.064011     1.5
3         338      444  4.330466     4.5
4          94      769  3.648050     3.5
...       ...      ...       ...     ...
10079     317     1345  3.365408     3.0
10080      44      782  4.109610     3.5
10081     274        4  4.395584     4.0
10082     317     3344  3.144758     3.0
10083     413     4500  2.857868     3.0

[10084 rows x 4 columns]


## Movie recommendations

We can now use the model to generate ratings for a movie we haven't seen.


In [14]:
# Your mappedUserId
mapped_user_id = unique_user_id[unique_user_id['userId'] == our_user_id]['mappedUserId'].values[0]

# Select movies that you haven't seen before
movies_rated = ratings_df[ratings_df['mappedUserId'] == mapped_user_id]
movies_not_rated = movies_df[~movies_df.index.isin(movies_rated['movieId'])]
movies_not_rated = movies_not_rated.merge(unique_movie_id, on='movieId')
movie = movies_not_rated.sample(1)

print(f"The movie we want to predict a raiting for is:  {movie['title'].item()}")

The movie we want to predict a raiting for is:  World Trade Center (2006)


In [15]:
# Create new `edge_label_index` between the user and the movie
edge_label_index = torch.tensor([
    mapped_user_id,
    movie.mappedMovieId.item()])


with torch.no_grad():
    test_data.to(device)
    pred = model(test_data.x_dict, test_data.edge_index_dict, edge_label_index)
    pred = pred.clamp(min=0, max=5).detach().cpu().numpy()

In [16]:
pred.item()

2.7233190536499023

## Explaining the Predictions

PyTorch Geometric also provides a way to explain the predictions of a GNN. Let's check which movie ratings have influenced this prediction the most.

We will use the [captum](https://captum.ai/) library to explain the predictions.

In [17]:
from torch_geometric.explain import Explainer, CaptumExplainer

explainer = Explainer(
    model=model,
    algorithm=CaptumExplainer('IntegratedGradients'),
    explanation_type='model',
    model_config=dict(
        mode='regression',
        task_level='edge',
        return_type='raw',
    ),
    node_mask_type=None,
    edge_mask_type='object',
)

explanation = explainer(
    test_data.x_dict, test_data.edge_index_dict, index=0,
    edge_label_index=edge_label_index).cpu().detach()
explanation

HeteroExplanation(
  prediction=[1],
  target=[1],
  index=[1],
  edge_label_index=[2],
  user={ x=[611, 611] },
  movie={ x=[9742, 404] },
  (user, rates, movie)={
    edge_mask=[90762],
    edge_index=[2, 90762],
  },
  (movie, rev_rates, user)={
    edge_mask=[90762],
    edge_index=[2, 90762],
  }
)

In [18]:
# User to movie link + attribution
user_to_movie = explanation['user', 'movie'].edge_index.numpy().T
user_to_movie_attr = explanation['user', 'movie'].edge_mask.numpy().T
user_to_movie_df = pd.DataFrame(
    np.hstack([user_to_movie, user_to_movie_attr.reshape(-1,1)]),
    columns = ['mappedUserId', 'mappedMovieId', 'attr']
)

# Movie to user link + attribution
movie_to_user = explanation['movie', 'user'].edge_index.numpy().T
movie_to_user_attr = explanation[ 'movie', 'user'].edge_mask.numpy().T
movie_to_user_df = pd.DataFrame(
    np.hstack([movie_to_user, movie_to_user_attr.reshape(-1,1)]),
    columns = ['mappedMovieId', 'mappedUserId','attr']
)
explanation_df = pd.concat([user_to_movie_df, movie_to_user_df])
explanation_df[["mappedUserId", "mappedMovieId"]] = explanation_df[["mappedUserId", "mappedMovieId"]].astype(int)

print(f"Attribtion for all edges towards prediction of movie rating of movie:\n {movie['title'].item()}")
print("==========================================================================================")
print(explanation_df.sort_values(by='attr'))

Attribtion for all edges towards prediction of movie rating of movie:
 World Trade Center (2006)
       mappedUserId  mappedMovieId      attr
45606           447           5253 -0.024605
39417           273           5253 -0.019153
75311           248           5253 -0.010085
20443           610            926 -0.000082
72392           610            460 -0.000065
...             ...            ...       ...
70294           610             16  0.015985
76402           610             20  0.016374
21129           610             34  0.016384
11582           610              0  0.021239
19793           176           5253  0.023719

[181524 rows x 3 columns]


In [19]:
# Select links that connect to our user
explanation_df = explanation_df[explanation_df['mappedUserId'] == mapped_user_id]

# We group the attribution scores by movie
explanation_df = explanation_df.groupby('mappedMovieId').sum()

# Merge with movies_df to receive title
# But first, we need to add the original id
explanation_df = explanation_df.merge(unique_movie_id, on='mappedMovieId')
explanation_df = explanation_df.merge(movies_df, on='movieId')

pd.options.display.float_format = "{:,.9f}".format

print("Top movies that influenced the prediction:")
print("==============================================")
print(explanation_df.sort_values(by='attr', ascending=False, key= lambda x: abs(x))[['title', 'attr']].head())

Top movies that influenced the prediction:
                              title        attr
0                  Toy Story (1995) 0.021200064
4  Silence of the Lambs, The (1991) 0.016353807
2               Forrest Gump (1994) 0.016348168
1               Pulp Fiction (1994) 0.015957274
6  Shawshank Redemption, The (1994) 0.015772806
