<a href="https://colab.research.google.com/github/zory233/4222project/blob/main/main.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Graph-Learning-Based Recommender System on MovieLens

### Group 9

- AGARWAL, Sahil
- WEI, Yuanjing
- ZHANG, Yujun yzhanglo@connect.ust.hk

Group project of COMP4222@HKUST in 2022 Fall.

# 1 Environment Configuration

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
import torch

# easier to print by putting variable as a single line
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

# make matplotlib figures appear inline in the notebook rather than in a new window.
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# Add some convenience functions to Pandas DataFrame.
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.2f}'.format
def mask(df, key, function):
  """Returns a filtered dataframe, by applying function to key"""
  return df[function(df[key])]

def flatten_cols(df):
  df.columns = [' '.join(col).strip() for col in df.columns.values]
  return df

pd.DataFrame.mask = mask
pd.DataFrame.flatten_cols = flatten_cols

# http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

In [2]:
try:
    from google.colab import drive
    drive.mount('/content/drive')
    %cd '/content/drive/MyDrive/4222Group9'
    %pwd
    %ls
    %pip install recommenders[gpu] -f https://download.pytorch.org/whl/cu111/torch_stable.html
    %pip install pyspark
except:
    pass

Mounted at /content/drive
/content/drive/.shortcut-targets-by-id/1REWvZ0Y4cMhH-HSEkETnBRmIrKs6pHXz/4222Group9


'/content/drive/.shortcut-targets-by-id/1REWvZ0Y4cMhH-HSEkETnBRmIrKs6pHXz/4222Group9'

[0m[01;34mcomp4222[0m/  LICENSE  lightgcn_deep_dive.ipynb  main.ipynb  movielens.ipynb
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in links: https://download.pytorch.org/whl/cu111/torch_stable.html
Collecting recommenders[gpu]
  Downloading recommenders-1.1.1-py3-none-any.whl (339 kB)
[K     |████████████████████████████████| 339 kB 32.4 MB/s 
[?25hCollecting category-encoders<2,>=1.3.0
  Downloading category_encoders-1.3.0-py2.py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 8.8 MB/s 
Collecting bottleneck<2,>=1.2.1
  Downloading Bottleneck-1.3.5-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (355 kB)
[K     |████████████████████████████████| 355 kB 73.3 MB/s 
[?25hCollecting cornac<2,>=1.1.2
  Downloading cornac-1.14.2-cp37-cp37m-manylinux1_x86_64.whl (12.4 MB)
[K     |████████████████████████████████| 12.4 MB 35.6 MB/s 
[?25hCollecting sc

In [3]:
import comp4222

# 2 MovieLens


We're using ml-latest-small from MovieLens. It contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018. The readme.md is avaliable [here](https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html).

## Data Loading

In [9]:
# Download MovieLens data.
dataset_name = "ml-latest-small"
from urllib.request import urlretrieve
import zipfile
urlretrieve(f"https://files.grouplens.org/datasets/movielens/{dataset_name}.zip", "movielens.zip")
zipfile.ZipFile("movielens.zip", "r").extractall()

('movielens.zip', <http.client.HTTPMessage at 0x7f9cc309fe50>)

In [10]:
movies = pd.read_csv(f"{dataset_name}/movies.csv")
genre_cols = [
    "(no genres listed)", "Action", "Adventure", "Animation", "Children", "Comedy",
    "Crime", "Documentary", "Drama", "Fantasy", "Film-Noir", "Horror",
    "Musical", "Mystery", "Romance", "Sci-Fi", "Thriller", "War", "Western"
]
movies

Unnamed: 0,movieId,title,genres
0,1,Toy Story (1995),Adventure|Animation|Children|Comedy|Fantasy
1,2,Jumanji (1995),Adventure|Children|Fantasy
2,3,Grumpier Old Men (1995),Comedy|Romance
3,4,Waiting to Exhale (1995),Comedy|Drama|Romance
4,5,Father of the Bride Part II (1995),Comedy
...,...,...,...
9737,193581,Black Butler: Book of the Atlantic (2017),Action|Animation|Comedy|Fantasy
9738,193583,No Game No Life: Zero (2017),Animation|Comedy|Fantasy
9739,193585,Flint (2017),Drama
9740,193587,Bungo Stray Dogs: Dead Apple (2018),Action|Animation


In [11]:
tags = pd.read_csv(f"{dataset_name}/tags.csv")
tags

Unnamed: 0,userId,movieId,tag,timestamp
0,2,60756,funny,1445714994
1,2,60756,Highly quotable,1445714996
2,2,60756,will ferrell,1445714992
3,2,89774,Boxing story,1445715207
4,2,89774,MMA,1445715200
...,...,...,...,...
3678,606,7382,for katie,1171234019
3679,606,7936,austere,1173392334
3680,610,3265,gun fu,1493843984
3681,610,3265,heroic bloodshed,1493843978


In [12]:
ratings = pd.read_csv(f"{dataset_name}/ratings.csv")
ratings

Unnamed: 0,userId,movieId,rating,timestamp
0,1,1,4.00,964982703
1,1,3,4.00,964981247
2,1,6,4.00,964982224
3,1,47,5.00,964983815
4,1,50,5.00,964982931
...,...,...,...,...
100831,610,166534,4.00,1493848402
100832,610,168248,5.00,1493850091
100833,610,168250,5.00,1494273047
100834,610,168252,5.00,1493846352


## Data Exploration

In [13]:
# %pip install altair
import altair as alt
alt.data_transformers.enable('default', max_rows=None)
alt.renderers.enable('colab')

DataTransformerRegistry.enable('default')

RendererRegistry.enable('colab')

# 3 Preliminaries

# 4 Model

# 5 Training and Inspecting

# 6 Testing

# 7 Credit and Reference