# Recommender - Proof of Concept

This notebook is to test functionality of recommendation functions.

# Libraries

In [None]:
%matplotlib inline
%load_ext autoreload
%autoreload 2 #would be where you need to specify the files
#%aimport comic_recs

# Pyspark imports
import pyspark
from pyspark.sql import SparkSession

In [3]:
import sys

In [4]:
sys.path.append('..')

In [6]:
# Model functions
import comic_recs as cr

In [7]:
# spark config
spark = SparkSession \
    .builder \
    .appName("movie recommendation") \
    .config("spark.driver.maxResultSize", "1g") \
    .config("spark.driver.memory", "1g") \
    .config("spark.executor.memory", "4g") \
    .config("spark.master", "local[*]") \
    .getOrCreate()

## Data Prep

These data/modeling related tasks need to be prepared beforehand.

### 1. Retrieve comics list as PySpark dataframe
List of known comics titles.

In [8]:
comics_df = spark.read.json('support_data/comics.json')
comics_df.persist()
comics_df.show(5)

+--------+--------------------+
|comic_id|         comic_title|
+--------+--------------------+
|       1|0Secret Wars (Mar...|
|       2|100 Bullets Broth...|
|       3|100 Penny Press L...|
|       4|100 Penny Press S...|
|       5|100 Penny Press T...|
+--------+--------------------+
only showing top 5 rows



### 2. Retrieve training data
All comic titles existing users have bought or subscribed.

In [9]:
comics_sold = spark.read.json('raw_data/als_input_filtered.json')
comics_sold.persist()
comics_sold.show(5)

+----------+------+--------+
|account_id|bought|comic_id|
+----------+------+--------+
|      2247|     1|     995|
|       487|     1|    1102|
|        29|     1|    2680|
|      1260|     1|    4870|
|       172|     1|    6023|
+----------+------+--------+
only showing top 5 rows



### 3. Set up model parameters
Parameters we have previously found through grid searching / cross-validation.

## Recommendations for New Users

Parameters as determined in our fitting process (see NB7)

In [10]:
# Create dictionary of candidate parameters
model_params = {'maxIter': 10
                 ,'rank': 5
                 ,'regParam': 0.1
                 ,'alpha': 100
                 ,'seed': 1234
               }

### Create list of existing user preferences

E.g. "I currently read or like these comic books." Doesn't have to be exact match, but the closer to an actual title, the better. We are just doing simple wildcard matches on titles.

In [11]:
reading_list = ['Transformers', 'GI Joe', 'Y The Last Man', 'Saga', 'Avengers'
               ,'Paper Girls', 'Star Wars']

In [7]:
reading_list = ['Batman', 'Sherlock Holmes', 'Attack on Titan', 'Thor']

In [8]:
reading_list = ['AVengers', 'wolverine', 'phoenix', 'deadpool']

In [12]:
reading_list = ['Moon Knight']

In [9]:
reading_list = ['Paper Girls']

### Get Recommendations!

Use the above inputs and decide how many comics to input into recommendation function.

In [12]:
recommendations = cr.make_comic_recommendations(reading_list=reading_list
                                                ,top_n=10
                                                ,comics_df=comics_df
                                                ,train_data=comics_sold
                                                ,model_params=model_params
                                                ,spark_instance=spark
                                               )

Total Runtime: 15.97 seconds


In [13]:
recommendations

Unnamed: 0,comic_title
0,Deathstroke (DC)
1,X-Force (Marvel)
2,Deadpool Vs Carnage (Marvel)
3,Deadpool Vs X-Force (Marvel)
4,Deadpool Draculas Gauntlet (Marvel)
5,Green Lanterns (DC)
6,Convergence Green Lantern Par (DC)
7,Sinestro (DC)
8,Green Lantern New Gods Godhea (DC)
9,Thanos Annual (Marvel)


In [20]:
# What if we just want value of first row?
recommendations.head(1)['comic_title'].values[0]

'Saga (Image)'