# Comics Rx
## [A comic book recommendation system](https://github.com/MangrobanGit/comics_rx)
<img src="https://images.unsplash.com/photo-1514329926535-7f6dbfbfb114?ixlib=rb-1.2.1&ixid=eyJhcHBfaWQiOjEyMDd9&auto=format&fit=crop&w=2850&q=80" width="400" align='left'>

---

# 5 - ALS Model - 'Pseudo' Deployment

This notebook is to explore and develop 'deploying' from a previously saved ALS model.

# Libraries

In [1]:
%matplotlib inline
%load_ext autoreload
%autoreload 2  # 1 would be where you need to specify the files
#%aimport data_fcns

import pandas as pd # dataframes
import os

# Data storage
from sqlalchemy import create_engine # SQL helper
import psycopg2 as psql #PostgreSQL DBs

# import necessary libraries
import pyspark
from pyspark.sql import SparkSession
from pyspark.ml.evaluation import RegressionEvaluator
# from pyspark.sql.types import (StructType, StructField, IntegerType
#                                ,FloatType, LongType, StringType)
from pyspark.sql.types import *

import pyspark.sql.functions as F
from pyspark.sql.functions import col, explode, lit, isnan, when, count
from pyspark.ml.recommendation import ALS, ALSModel
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder, TrainValidationSplit
from pyspark.ml.evaluation import BinaryClassificationEvaluator

# Custom
import data_fcns as dfc
import keys  # Custom keys lib
import comic_recs as cr

# Data storage
from sqlalchemy import create_engine # SQL helper
import psycopg2 as psql #PostgreSQL DBs

import time

import numpy as np

In [2]:
# instantiate SparkSession object
spark = pyspark.sql.SparkSession.builder.master("local[*]").getOrCreate()
# spark = SparkSession.builder.master("local").getOrCreate()

## Retrieving Saved Model

In [3]:
comic_rec_model = ALSModel.load('als_filtered')

In [4]:
top_n_df = cr.get_top_n_recs_for_user(spark=spark, model=comic_rec_model, topn=50)
top_n_df

161


Unnamed: 0,comic_title
1,Criminal (Image)
2,Bitch Planet (Image)
3,Royal City (Image)
4,Black Widow (Marvel)
5,All New Hawkeye (Marvel)
6,Shipwreck (Other)
7,Sex Criminals (Image)
8,Neil Gaiman American Gods Sha (Dark Horse)
9,Spider-Gwen (Marvel)
10,Sweet Tooth (Vertigo)


I'm testing on myself. I'm pretty sure I've bought a few of those title's above. But this could be a failure in how I aggregated on series, but there some evidence of that failing. One example is *Gideon Falls*. There should be only one volume of that. Maybe it's graphic novels? But that shouldn't be an issue (no pun intended) because I believe the original dataset should just be individual comic books. 

Let's test versus the original dataset!

#### Set aside some test series.

- Paper Girls (Image)
- Saga (Other)
- Fade Out (Image)

These I know **for sure** I've bought, if not subscribed.

## Set up connection to AWS RDS

In [5]:
# Define path to secret
secret_path_aws = os.path.join(os.environ['HOME'], '.secret', 
                           'aws_ps_flatiron.json')
secret_path_aws

'/home/ubuntu/.secret/aws_ps_flatiron.json'

In [6]:
aws_keys = keys.get_keys(secret_path_aws)
user = aws_keys['user']
ps = aws_keys['password']
host = aws_keys['host']
db = aws_keys['db_name']

aws_ps_engine = ('postgresql://' + user + ':' + ps + '@' + host + '/' + db)

In [7]:
# Setup PSQL connection
conn = psql.connect(
    database=db,
    user=user,
    password=ps,
    host=host,
    port='5432'
)

In [8]:
# Instantiate cursor
cur = conn.cursor()

In [9]:
#  Count records.
query = """
    SELECT
       *
    FROM 
        comic_trans 
    WHERE
        account_num = '00161'
    ;
"""

In [10]:
conn.rollback()

In [11]:
# Execute the query
cur.execute(query)

In [12]:
# Check results
temp_df = pd.DataFrame(cur.fetchall())
temp_df.columns = [col.name for col in cur.description]

In [13]:
temp_df.head()

Unnamed: 0,index,publisher,item_id,title_and_num,qty_sold,date_sold,account_num,comic_title
0,33,Archie Comics,DCD617897,Afterlife With Archie #1 Franc,1,2013-10-30 15:14:23,161,Afterlife With Archie (Archie)
1,43,Archie Comics,DCD617564,Afterlife With Archie #1 Reg C,1,2013-10-30 15:14:23,161,Afterlife With Archie (Archie)
2,54,Archie Comics,DCDL012758,Afterlife With Archie #10 Cvr,1,2016-09-04 11:08:02,161,Afterlife With Archie (Archie)
3,104,Archie Comics,DCD622673,Afterlife With Archie #3 Reg F,1,2014-01-24 11:42:27,161,Afterlife With Archie (Archie)
4,124,Archie Comics,DCD625043,Afterlife With Archie #4 Reg F,1,2014-03-12 18:12:06,161,Afterlife With Archie (Archie)


In [22]:
# Make a list of test comic_title
already_bought = ['Paper Girls (Image)', 'Saga (Other)', 'Fade Out (Image)']

In [23]:
temp_df.loc[temp_df['comic_title'].isin(already_bought)]

Unnamed: 0,index,publisher,item_id,title_and_num,qty_sold,date_sold,account_num,comic_title
789,230912,Image Comics,DCD647620,Fade Out #1 Movie Magazine Var,1,2014-08-22 17:37:24,161,Fade Out (Image)
790,230950,Image Comics,DCD685936,Fade Out #10 (Mr),1,2015-10-21 17:39:10,161,Fade Out (Image)
791,231004,Image Comics,DCD688869,Fade Out #11 (Mr),1,2015-11-27 13:21:36,161,Fade Out (Image)
792,231065,Image Comics,DCD691723,Fade Out #12 (Mr),1,2016-01-08 14:54:02,161,Fade Out (Image)
793,231126,Image Comics,DCD650356,Fade Out #2 (Mr),1,2014-10-04 13:12:21,161,Fade Out (Image)
794,231216,Image Comics,DCD652805,Fade Out #3 (Mr),1,2014-11-20 18:16:31,161,Fade Out (Image)
795,231272,Image Comics,DCD655569,Fade Out #4 (Mr),1,2015-01-08 17:56:58,161,Fade Out (Image)
796,231364,Image Comics,DCD663742,Fade Out #5 (Mr),1,2015-05-02 11:56:46,161,Fade Out (Image)
797,231417,Image Comics,DCD666441,Fade Out #6 (Mr),1,2015-05-27 18:34:57,161,Fade Out (Image)
798,231480,Image Comics,DCD669339,Fade Out #7 (Mr),1,2015-07-02 12:36:50,161,Fade Out (Image)


Ok, so I already knew this was the case, but just wanted to confirm.

Let's filter out comics already bought.