# Interacting with Campaigns and Filters

In this notebook, we'll **use** our deployed recommendation models - querying them for recommendations and sending **feedback** to update the model state.

⚠️ You'll need to already have run the previous notebooks in this series to set up your environment and deploy **campaigns** (endpoints) in Amazon Personalize, including **waiting for your campaigns to become active**

Before we start, we'll here:

- Import the libraries this notebook will use
- Load the variables saved from previous steps
- Connect to the relevant AWS services as we have before for IAM and S3

In [2]:
!pip install tqdm

Collecting tqdm
  Downloading tqdm-4.59.0-py2.py3-none-any.whl (74 kB)
[K     |████████████████████████████████| 74 kB 4.2 MB/s  eta 0:00:01
[?25hInstalling collected packages: tqdm
Successfully installed tqdm-4.59.0


In [3]:
# Python Built-Ins:
from collections import defaultdict
from datetime import datetime
import os
import json
import time
import uuid  # For generating random IDs

# External Dependencies:
import boto3  # AWS SDK for Python
import pandas as pd  # DataFrame (table) manipulation tools
from tqdm import tqdm  # Progress bar

# Local Dependencies:
import util  # Small tool to print progress spinner

# import time
# from time import sleep
# import random

# Reload saved variables:
%store -r

# Connect to AWS services:
personalize = boto3.client("personalize")  # We've used these management APIs before
personalize_events = boto3.client("personalize-events")  # Note this new one!
personalize_runtime = boto3.client("personalize-runtime")  # And this one!

## Introduction

Once a *campaign* is deployed, we have a private, real-time API ready to serve recommendation requests.

Just like other similar AWS services:

- We'll typically use this API via the AWS **SDKs for whatever language our application uses** (e.g. [Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/personalize-runtime.html) for Python, the [AWS SDK for JavaScript](https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/PersonalizeRuntime.html) and so on)... not least because these simplify request signing/security for us.
- Access is controlled by [AWS IAM](https://aws.amazon.com/iam/) - so we assume you're running this notebook in an environment (e.g. a SageMaker noteobook) with credentials (e.g. an execution role) authorized to interact with your Amazon Personalize resources (for example the IAM `AmazonPersonalizeFullAccess` policy).

We'll start by loading up our movie metadata, which will allow us to associate returned movie IDs to their titles, and make results later a bit more human-readable:

In [4]:
titles_df = pd.read_csv(dataset_dir + "/movies.csv", index_col="movieId")[["title"]].rename(
    columns={ "title": "TITLE" },
)
titles_df.head()

items_df = pd.read_csv(items_path, index_col="ITEM_ID", dtype={ "YEAR": "Int64" })

items_df = items_df.join(titles_df)
items_df.head()

Unnamed: 0_level_0,GENRES,YEAR,TITLE
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,Adventure|Animation|Children|Comedy|Fantasy,1995,Toy Story (1995)
2,Adventure|Children|Fantasy,1995,Jumanji (1995)
3,Comedy|Romance,1995,Grumpier Old Men (1995)
4,Comedy|Drama|Romance,1995,Waiting to Exhale (1995)
5,Comedy,1995,Father of the Bride Part II (1995)


By setting `ITEM_ID` as the **index** of this dataframe, we've made it very simple to look up movies using the `loc[]` operator - either one-by-one or in bulk:

In [7]:
items_df.loc[[10,2,3]]

Unnamed: 0_level_0,GENRES,YEAR,TITLE
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
10,Action|Adventure|Thriller,1995,GoldenEye (1995)
2,Adventure|Children|Fantasy,1995,Jumanji (1995)
3,Comedy|Romance,1995,Grumpier Old Men (1995)


We'll also define a simple utility function to retrieve the full ARN for a given campaign by name - to make it easy for you to update this code in case you chose different names for your deployments:

In [8]:
def get_campaign_arn_by_name(campaign_name):
    campaigns = personalize.list_campaigns()["campaigns"]
    try:
        return next(filter(
            lambda c: c["name"] == campaign_name,
            campaigns,
        ))["campaignArn"]
    except StopIteration:
        raise ValueError("Campaign '{}' not found! Got:\n- {}".format(
            campaign_name,
            "\n- ".join(map(lambda c: c["name"], campaigns))
        ))

# For example:
get_campaign_arn_by_name("personalize-movielens-sims")

'arn:aws:personalize:ap-southeast-1:090247010259:campaign/personalize-movielens-sims'

## Fetching Recommendations: Similar Items

Querying recommendations uses the `personalize-runtime` service, rather than the standard `personalize` service we've used previously for management operations (such as training solutions, deploying campaigns, and so on).

Our `SIMS` campaign recommends "similar" items (in terms of the users who watch/click/buy them, not the item metadata) - so it takes an item ID as input.

Item ID 1 from before was "Toy Story (1995)": Let's see what other films Toy Story reviewers might like:

In [9]:
sims_campaign_arn = get_campaign_arn_by_name("personalize-movielens-sims")

sims_response = personalize_runtime.get_recommendations(
    campaignArn=sims_campaign_arn,
    itemId=str(1),
    numResults=10,
)

sims_response

{'ResponseMetadata': {'RequestId': 'c2b66412-4669-4c4e-9ea2-0afc5d33d327',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/json',
   'date': 'Tue, 23 Mar 2021 13:27:41 GMT',
   'x-amzn-requestid': 'c2b66412-4669-4c4e-9ea2-0afc5d33d327',
   'content-length': '566',
   'connection': 'keep-alive'},
  'RetryAttempts': 0},
 'itemList': [{'itemId': '3114'},
  {'itemId': '780'},
  {'itemId': '648'},
  {'itemId': '1073'},
  {'itemId': '736'},
  {'itemId': '150'},
  {'itemId': '588'},
  {'itemId': '34'},
  {'itemId': '95'},
  {'itemId': '364'}],
 'recommendationId': 'RID-27e3833b-8943-407d-9c3b-3503e02044ec'}

It works! But doesn't tell us much by itself. Let's map that raw result back to our items table:

In [10]:
def recs_to_dataframe(item_list):
    recs_df = pd.DataFrame(item_list).rename(columns={ "itemId": "ITEM_ID" })
    recs_df["ITEM_ID"] = pd.to_numeric(recs_df["ITEM_ID"]).astype("Int64")
    recs_df.set_index("ITEM_ID", inplace=True)
    return recs_df.join(items_df)

recs_to_dataframe(sims_response["itemList"])

Unnamed: 0_level_0,GENRES,YEAR,TITLE
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
3114,Adventure|Animation|Children|Comedy|Fantasy,1999,Toy Story 2 (1999)
780,Action|Adventure|Sci-Fi|Thriller,1996,Independence Day (a.k.a. ID4) (1996)
648,Action|Adventure|Mystery|Thriller,1996,Mission: Impossible (1996)
1073,Children|Comedy|Fantasy|Musical,1971,Willy Wonka & the Chocolate Factory (1971)
736,Action|Adventure|Romance|Thriller,1996,Twister (1996)
150,Adventure|Drama|IMAX,1995,Apollo 13 (1995)
588,Adventure|Animation|Children|Comedy|Musical,1992,Aladdin (1992)
34,Children|Drama,1995,Babe (1995)
95,Action|Adventure|Thriller,1996,Broken Arrow (1996)
364,Adventure|Animation|Children|Drama|Musical|IMAX,1994,"Lion King, The (1994)"


Much more informative! In our test (your exact results may vary), Toy Story 2 came out top of the list - suggesting that users who reviewed (and liked) the original are very likely to enjoy the sequel... Seems to make sense!

...But what do we see if we query similar items for a few movies at random?

In [11]:
random_items = items_df.sample(5)

random_sims = {}
n_recs = 10

for item_id, item_meta in random_items.iterrows():
    item_sims = recs_to_dataframe(
        personalize_runtime.get_recommendations(
            campaignArn=sims_campaign_arn,
            itemId=str(item_id),
            numResults=n_recs,
        )["itemList"]
    )
    random_sims[item_meta["TITLE"]] = (
        # Need to pad the results to n_recs because some movies might return fewer:
        item_sims["TITLE"].to_list() + (n_recs * [None])
    )[:n_recs]

random_sims = pd.DataFrame(random_sims)
random_sims

Unnamed: 0,"Island, The (2005)",Bullets Over Broadway (1994),My Girl 2 (1994),Planet 51 (2009),Delivery Man (2013)
0,Hollow Man (2000),Beauty of the Day (Belle de jour) (1967),"Shawshank Redemption, The (1994)","Shawshank Redemption, The (1994)","Shawshank Redemption, The (1994)"
1,Gothika (2003),My Life in Pink (Ma vie en rose) (1997),Forrest Gump (1994),Forrest Gump (1994),Forrest Gump (1994)
2,War of the Worlds (2005),Dolores Claiborne (1995),Pulp Fiction (1994),Pulp Fiction (1994),Pulp Fiction (1994)
3,Cloverfield (2008),Once Were Warriors (1994),"Silence of the Lambs, The (1991)","Silence of the Lambs, The (1991)","Silence of the Lambs, The (1991)"
4,"Day After Tomorrow, The (2004)",Mighty Aphrodite (1995),"Matrix, The (1999)","Matrix, The (1999)","Matrix, The (1999)"
5,"I, Robot (2004)",Muriel's Wedding (1994),Braveheart (1995),Braveheart (1995),Braveheart (1995)
6,Aeon Flux (2005),"Brothers McMullen, The (1995)",Star Wars: Episode IV - A New Hope (1977),Star Wars: Episode IV - A New Hope (1977),Star Wars: Episode IV - A New Hope (1977)
7,"Beach, The (2000)",Dead Man Walking (1995),Jurassic Park (1993),Jurassic Park (1993),Jurassic Park (1993)
8,Déjà Vu (Deja Vu) (2006),"Madness of King George, The (1994)",Schindler's List (1993),Schindler's List (1993),Schindler's List (1993)
9,Shallow Hal (2001),Four Weddings and a Funeral (1994),Terminator 2: Judgment Day (1991),Terminator 2: Judgment Day (1991),Terminator 2: Judgment Day (1991)


You probably found that **a lot of the results look the same** (Hopefully not all of them - this is more likely with a smaller # of interactions, which may be more common with the small MovieLens subset).

Why is this? Well, movies that have been **watched by a lot of users** will show up in the co-occurrence/"similar" sets for more movie IDs!

This goes to show that evaluation metrics should not be the only thing we rely on when evaluating our solution version. So what can we do about it?

This is a good time to revisit the **hyperparameters** of the Personalize recipes. The SIMS recipe has a `popularity_discount_factor` hyperparameter (see [documentation](https://docs.aws.amazon.com/personalize/latest/dg/native-recipe-sims.html)). Leveraging this hyperparameter allows you to control the nuance you see in the results. This parameter and its behavior will be unique to every dataset you encounter, and depends on the goals of the business. You can iterate on the value of this hyperparameter until you are satisfied with the results, or you can start by leveraging Personalize's hyperparameter optimization (HPO) feature. For more information on hyperparameters and HPO tuning, see the [documentation](https://docs.aws.amazon.com/personalize/latest/dg/customizing-solution-config-hpo.html).

## User Personalization Recommendations

Collaborative filtering-style similar item recommendations are useful, but let's explore something a bit more *personalized*!

For this recipe, our input is a **user** ID and the deployed campaign will recommend the movies it thinks are most relevant for that user: Plus some extra recommendations to try and cold-start new items which don't have many interactions yet.

Like we took film ID '1' for our first attempt with sims, let's look at the recommendations for user ID '1':

In [12]:
up_campaign_arn = get_campaign_arn_by_name("personalize-movielens-up")

up_response = personalize_runtime.get_recommendations(
    campaignArn=up_campaign_arn,
    userId=str(1),
    numResults=10,
)

up_response

{'ResponseMetadata': {'RequestId': '729e341f-4b3b-4175-bcd7-94a0293bac2d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/json',
   'date': 'Tue, 23 Mar 2021 13:35:23 GMT',
   'x-amzn-requestid': '729e341f-4b3b-4175-bcd7-94a0293bac2d',
   'content-length': '624',
   'connection': 'keep-alive'},
  'RetryAttempts': 0},
 'itemList': [{'itemId': '2000', 'score': 0.0068093},
  {'itemId': '293', 'score': 0.006102},
  {'itemId': '3448', 'score': 0.0051529},
  {'itemId': '2028', 'score': 0.0048267},
  {'itemId': '2987', 'score': 0.0043085},
  {'itemId': '1393', 'score': 0.0040608},
  {'itemId': '1213', 'score': 0.0038899},
  {'itemId': '1394', 'score': 0.0037548},
  {'itemId': '1073', 'score': 0.0036348},
  {'itemId': '1610', 'score': 0.0035253}],
 'recommendationId': 'RID-cb5f2fd7-86f6-48f0-9544-a55d5f41756f'}

Note that one difference with this recipe type is that we also receive `score` values for each returned items - so can have an idea of how confident (or not) the model is in its returned results!

Let's visualize how different users will receive different recommendations, and compare them to an *anonymous* user by inserting a made-up user ID, `Anonymous`:

In [13]:
user_ids = ["Anonymous"] + list(range(300, 305))

up_user_recs = {}
n_recs = 10

for user_id in user_ids:
    user_recs = recs_to_dataframe(
        personalize_runtime.get_recommendations(
            campaignArn=up_campaign_arn,
            userId=str(user_id),
            numResults=n_recs,
        )["itemList"]
    )
    up_user_recs[f"User {user_id}"] = (
        # Need to pad the results to n_recs because some movies might return fewer:
        user_recs["TITLE"].to_list() + (n_recs * [None])
    )[:n_recs]

up_user_recs = pd.DataFrame(up_user_recs)
up_user_recs

Unnamed: 0,User Anonymous,User 300,User 301,User 302,User 303,User 304
0,Apollo 13 (1995),Catch Me If You Can (2002),"Departed, The (2006)",Fargo (1996),"Fifth Element, The (1997)",Heat (1995)
1,Dances with Wolves (1990),American Beauty (1999),No Country for Old Men (2007),Schindler's List (1993),Total Recall (1990),Léon: The Professional (a.k.a. The Professiona...
2,"Shawshank Redemption, The (1994)","Departed, The (2006)",Slumdog Millionaire (2008),Goodfellas (1990),"Abyss, The (1989)",Pulp Fiction (1994)
3,Batman (1989),Gran Torino (2008),In Bruges (2008),Pulp Fiction (1994),"Lost World: Jurassic Park, The (1997)","Crying Game, The (1992)"
4,Pulp Fiction (1994),Memento (2000),"Pan's Labyrinth (Laberinto del fauno, El) (2006)","Usual Suspects, The (1995)",Aliens (1986),Dead Man Walking (1995)
5,True Lies (1994),Requiem for a Dream (2000),"Pursuit of Happyness, The (2006)",Saving Private Ryan (1998),Back to the Future (1985),True Romance (1993)
6,"Matrix, The (1999)",Million Dollar Baby (2004),Inside Man (2006),Die Hard (1988),Star Trek: First Contact (1996),In the Line of Fire (1993)
7,"Sixth Sense, The (1999)","Pan's Labyrinth (Laberinto del fauno, El) (2006)",Juno (2007),"Godfather, The (1972)",Star Wars: Episode I - The Phantom Menace (1999),Schindler's List (1993)
8,Star Wars: Episode IV - A New Hope (1977),Snatch (2000),"Hangover, The (2009)","Godfather: Part II, The (1974)",RoboCop (1987),"Few Good Men, A (1992)"
9,Fight Club (1999),"Girl with the Dragon Tattoo, The (2011)",Transsiberian (2008),Raiders of the Lost Ark (Indiana Jones and the...,"Rocketeer, The (1991)",Hackers (1995)


We see that recommendations for the unknown user ID `Anonymous` come back strongly biased towards across-the-board popular films... Whereas our registered users see recommendations more tailored to their viewing (reviewing) history.

This is a good start, but what we'd really like is for these recommendations to **update in (near-) real time** as a user interacts with more items on our site!

## Using Filters: "Sci-Fi Season"

What if we have a special **Sci-Fi season** promotion and we'd like to tweak these same users' recommendations to emphasise our items in the `Sci-Fi` genre category?

Luckily, we created a `by-genre` **filter** in the previous notebook! This filter takes a `$GENRE` **parameter**, so we can request which genre we'd like to filter on at run-time.

First, we'll need to look up the filter ARN:

In [16]:
def get_filter_arn_by_name(filter_name):
    filters = personalize.list_filters(datasetGroupArn=dataset_group_arn)["Filters"]
    try:
        return next(filter(
            lambda f: f["name"] == filter_name,
            filters
        ))["filterArn"]
    except StopIteration:
        raise ValueError("Filter '{}' not found! Got:\n- {}".format(
            filter_name,
            "\n- ".join(map(lambda f: f["name"], filters))
        ))

genre_filter_arn = get_filter_arn_by_name("by-genre")
print(genre_filter_arn)

arn:aws:personalize:ap-southeast-1:090247010259:filter/by-genre


Now, we can generate recommendations as before... But this time specifying our **filter (by ARN)** and **filter variables** (the actual genre we want):

In [17]:
up_user_recs = {}

for user_id in user_ids:
    user_recs = recs_to_dataframe(
        personalize_runtime.get_recommendations(
            campaignArn=up_campaign_arn,
            userId=str(user_id),
            numResults=n_recs,
            # ADDED:
            filterArn=genre_filter_arn,
            filterValues={ "GENRE": json.dumps("Sci-Fi") }
        )["itemList"]
    )
    up_user_recs[f"User {user_id}"] = (
        # Need to pad the results to n_recs because some movies might return fewer:
        user_recs["TITLE"].to_list() + (n_recs * [None])
    )[:n_recs]

up_user_recs = pd.DataFrame(up_user_recs)
up_user_recs

Unnamed: 0,User Anonymous,User 300,User 301,User 302,User 303,User 304
0,"Matrix, The (1999)","Prestige, The (2006)",Youth Without Youth (2007),"Matrix, The (1999)","Fifth Element, The (1997)",Terminator 2: Judgment Day (1991)
1,Star Wars: Episode IV - A New Hope (1977),Donnie Darko (2001),Eternal Sunshine of the Spotless Mind (2004),Back to the Future (1985),Total Recall (1990),Sneakers (1992)
2,Independence Day (a.k.a. ID4) (1996),"Clockwork Orange, A (1971)","Prestige, The (2006)",Twelve Monkeys (a.k.a. 12 Monkeys) (1995),"Abyss, The (1989)",Star Wars: Episode IV - A New Hope (1977)
3,Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Inception (2010),Children of Men (2006),Star Wars: Episode IV - A New Hope (1977),"Lost World: Jurassic Park, The (1997)",Twelve Monkeys (a.k.a. 12 Monkeys) (1995)
4,Star Wars: Episode V - The Empire Strikes Back...,Eternal Sunshine of the Spotless Mind (2004),Moon (2009),Star Wars: Episode VI - Return of the Jedi (1983),Aliens (1986),Jurassic Park (1993)
5,Star Wars: Episode VI - Return of the Jedi (1983),V for Vendetta (2006),"Clockwork Orange, A (1971)",E.T. the Extra-Terrestrial (1982),Back to the Future (1985),Powder (1995)
6,Contact (1997),Children of Men (2006),Inception (2010),Terminator 2: Judgment Day (1991),Star Trek: First Contact (1996),Strange Days (1995)
7,Back to the Future (1985),"Matrix, The (1999)",The Butterfly Effect 3: Revelations (2009),Star Wars: Episode V - The Empire Strikes Back...,Star Wars: Episode I - The Phantom Menace (1999),Star Wars: Episode VI - Return of the Jedi (1983)
8,Jurassic Park (1993),"Truman Show, The (1998)",Watchmen (2009),Independence Day (a.k.a. ID4) (1996),RoboCop (1987),Men in Black (a.k.a. MIB) (1997)
9,Inception (2010),Aliens (1986),District 9 (2009),Blade Runner (1982),"Rocketeer, The (1991)",Judge Dredd (1995)


For some users (looking at you, `303`!) who were already pretty Sci-Fi keen, the recommendations haven't changed very much - but for others, we've made quite a difference!

Filtered recommendations can serve many different use-cases such as:

- Applying eligibility rules like membership tier, availability, or ratings
- Easily creating *shelves* (also known as *rails* or *carousels*) of different item categories with personalized, per-user rankings
- Promotional events as suggested above
- Removing previously-interacted/purchased items from results (or creating lists specifically targeting repeat purchase)
- etc etc

## Real-Time Event Feedback

Beyond generating static recommendations on-demand for each user, Personalize has the ability to **listen to events** from your application and **update recommendations** shown to users in near-real-time. In this example we'll focus on injecting **interaction events** (i.e. new clicks/reviews/purchases as a customer move around the site) - but as detailed in the [incremental dataset import documentation](https://docs.aws.amazon.com/personalize/latest/dg/incremental-data-updates.html) there are also APIs available for updating user and item metadata, too.

We already created an **event tracker** in the last notebook and kept a record of the `tracking_id`.

With this tracking ID, we're able to set up a utility function below that will **log a new interaction** via the **Personalize Events API**.

Note that:

- In Personalize, user activity is grouped into **sessions**, so we'll just use a simple logic here which creates a new random session ID the first time each `user_id` is used. In real applications, the website's existing session ID system might be used instead.
- In an ideal world we will give **not just positive feedback** (interaction events), but also tell the model **what didn't work** by providing `impression` or `recommendation ID` feedback

You can find more information about feedback in the [Recording Events](https://docs.aws.amazon.com/personalize/latest/dg/recording-events.html) section of the documentation.

In [18]:
def generate_random_session_id():
    return str(uuid.uuid1)

# Mapping from user_id to session_id
session_dict = defaultdict(generate_random_session_id)

def log_new_review(
    user_id,
    item_id,
    rating=5,
    recommendation_id=None,
    impression=None,
    metadata=None,
):
    """Log a new event via Amazon Personalize's Event Tracker"""
    session_id = session_dict[user_id]
    event = {
        "eventType": "review",
        "eventValue": rating,
        "itemId": str(item_id),
        "sentAt": datetime.now(),
    }
    if impression:
        # Optionally pass in a list of item ID strings that were presented to the user before
        # selecting this one (for negative feedback!)
        event["impression"] = impression
    if recommendation_id:
        # Optionallly pass in the recommendation ID of the list that was generated driving this
        # interaction (for implicit feedback!)
        event["recommendationId"] = recommendation_id
    
    personalize_events.put_events(
        trackingId=tracking_id,
        userId=str(user_id),
        sessionId=session_id,
        eventList=[event]
    )

Let's imagine that one of our users who wasn't much of a fan before gets really in to our *Sci-Fi Season* promotion - how might their regular (non-promotional) recommendations change afterwards?

In the next cell, we'll choose a particular user ID and:

- Generate a list of `n_steps` Sci-Fi films they might watch (using our filtered Sci-Fi recommendations from before)
- Fetch their *initial* list of **general** movie recommendations (without the Sci-Fi filter)
- Loop through logging new review events for the Sci-Fi films, and seeing how the user's **general** recommendations change after each one

We'll display the changing general recommendations as columns of a table:

In [19]:
user_id = 400
n_steps = 7

# Let's assume they start watching all their personally-recommended Sci-Fi movies, but starting after the Nth
# one down the list (since the ones at the very top are probably more generic):
n_watchstart = 5
# Create the list of movies they'll watch:
scifi_recs = recs_to_dataframe(
    personalize_runtime.get_recommendations(
        campaignArn=up_campaign_arn,
        userId=str(user_id),
        numResults=n_watchstart + n_steps,
        # ADDED:
        filterArn=genre_filter_arn,
        filterValues={ "GENRE": json.dumps("Sci-Fi") }
    )["itemList"]
).iloc[n_watchstart:]

# Now generate their personal recommendations before, and after each Sci-Fi movie watched (reviewed):
# Each column in our results table will list recommended titles; with most recent watched movie as the header
history = [
    recs_to_dataframe(
        personalize_runtime.get_recommendations(
            campaignArn=up_campaign_arn,
            userId=str(user_id),
            numResults=n_recs,
        )["itemList"]
    )["TITLE"].rename("Initial Recs").reset_index(drop=True)
]

def review_movie_and_update_history(watched_item):
    """Send movie feedback to Personalize and fetch updated user recommendations"""
    watched_item_id, watched_item_meta = watched_item
    watched_item_title = watched_item_meta["TITLE"]

    # Record the watched movie:
    log_new_review(user_id, watched_item_id)

    # Wait a little to help the near-real-time updates propagate:
    time.sleep(2.5)

    # Generate & record the new personal (non-Sci-Fi) recommendations:
    history.append(
        recs_to_dataframe(
            personalize_runtime.get_recommendations(
                campaignArn=up_campaign_arn,
                userId=str(user_id),
                numResults=n_recs,
            )["itemList"]
        )["TITLE"].rename(watched_item_title).reset_index(drop=True)
    )


# Now loop through the steps with a progress bar:
util.progress.notebook_safe_tqdm_loop(
    tqdm(scifi_recs.iterrows(), total=n_steps, unit="steps", desc="Simulating"),
    review_movie_and_update_history,
)

history = pd.concat(history, axis=1)
history

Simulating: 100%|██████████| 7/7 [00:19<00:00,  2.73s/steps]


Unnamed: 0,Initial Recs,Twelve Monkeys (a.k.a. 12 Monkeys) (1995),Star Wars: Episode V - The Empire Strikes Back (1980),"Truman Show, The (1998)",Donnie Darko (2001),Inception (2010),V for Vendetta (2006),District 9 (2009)
0,Fight Club (1999),Fight Club (1999),American Beauty (1999),American Beauty (1999),"Clockwork Orange, A (1971)",Fight Club (1999),"Prestige, The (2006)","Prestige, The (2006)"
1,Fargo (1996),American Beauty (1999),Fight Club (1999),Saving Private Ryan (1998),Back to the Future (1985),"Avengers, The (2012)",District 9 (2009),I Am Legend (2007)
2,Saving Private Ryan (1998),Fargo (1996),Fargo (1996),Fargo (1996),American Beauty (1999),District 9 (2009),V for Vendetta (2006),Inglourious Basterds (2009)
3,"Silence of the Lambs, The (1991)","Silence of the Lambs, The (1991)",Saving Private Ryan (1998),"Matrix, The (1999)",Gladiator (2000),Inglourious Basterds (2009),Inglourious Basterds (2009),"Bourne Ultimatum, The (2007)"
4,"Dark Knight, The (2008)",Saving Private Ryan (1998),"Matrix, The (1999)",Fight Club (1999),Fargo (1996),American Beauty (1999),Star Trek (2009),Children of Men (2006)
5,American Beauty (1999),Seven (a.k.a. Se7en) (1995),Aliens (1986),Seven (a.k.a. Se7en) (1995),Fight Club (1999),"Prestige, The (2006)","Dark Knight, The (2008)",District 9 (2009)
6,One Flew Over the Cuckoo's Nest (1975),L.A. Confidential (1997),"Silence of the Lambs, The (1991)","Dark Knight, The (2008)","Dark Knight, The (2008)",Forrest Gump (1994),Iron Man (2008),"Departed, The (2006)"
7,L.A. Confidential (1997),"Matrix, The (1999)",Die Hard (1988),American History X (1998),Seven (a.k.a. Se7en) (1995),Avatar (2009),"Bourne Ultimatum, The (2007)",Iron Man (2008)
8,Reservoir Dogs (1992),Memento (2000),Seven (a.k.a. Se7en) (1995),Léon: The Professional (a.k.a. The Professiona...,Star Wars: Episode I - The Phantom Menace (1999),Star Trek (2009),Children of Men (2006),Casino Royale (2006)
9,Seven (a.k.a. Se7en) (1995),Goodfellas (1990),Star Wars: Episode V - The Empire Strikes Back...,Back to the Future (1985),"Silence of the Lambs, The (1991)",Inception (2010),I Am Legend (2007),Star Trek (2009)


As we can see, the recommendations **dynamically adapt** as the user watches more movies (the headings) - probably surfacing a few more Sci-Fi films in this user's general recommendations than were present in the initial set.

> ⚠️ **Note:** The shifts may not be particularly significant or intuitive in this sample dataset due to the small data volume and limitations discussed when we evaluated our models, but hopefully you still see a few interesting changes!

This responsivity is particularly useful in video-on-demand, e-retail, and a whole load of other settings where users may have different **intents** between sessions: Depending whether they're for example watching with children; shopping for something in particular, or so on.

As discussed in more detail in [this blog post](https://aws.amazon.com/blogs/machine-learning/amazon-personalize-can-now-create-up-to-50-better-recommendations-for-fast-changing-catalogs-of-new-products-and-fresh-content/), it's **still important to periodically re-train your model**: But this dynamic state provides additional ability to serve your users the right recommendations at the right time.

> ⚠️ **Note:** As discussed previously, our sample is a *review* dataset but typical video-on-demand applications would be more likely to deal in "view"/"watch" events. Just as we experimented with what rating threshold to consider for events, VoD applications may need to consider **how much of the video** a user must have watched before an event is recorded: Sending at 100% complete could miss a lot of people who skip the credits!

## Personalized Ranking

What about use-cases where it's **too difficult to write filtering rules**, and we'd instead like to provide a **shortlist** of items for the model to re-rank in order of relevance for the user? That's the core use case for our *personalized ranking* models.

For example you may want to dynamically render a personalized shelf/rail/carousel based on some highly complex criteria such as:

- Information that isn't available in your Personalize item metadata (e.g. directior, location, superhero franchise)
- Results from some complex upstream short-listing algorithm (like results of a search engine query, kNN or some other machine learning algorithm to generate a cluster of candidate items)
- Potentially diverse shortlists that need to be *manually curated* for some other reason.

Re-ranking campaigns use a slightly different [GetPersonalizedRanking API](https://docs.aws.amazon.com/personalize/latest/dg/API_RS_GetPersonalizedRanking.html) from the [GetRecommendations](https://docs.aws.amazon.com/personalize/latest/dg/API_RS_GetRecommendations.html) one we've been using so far - but essentially the main difference is just that we need to **supply a list of item IDs** in the request.

As an example, let's recommend **Christmas movies**.

Our dataset doesn't seem to have any appropriate tags for this in the `GENRES` field, so we can tackle the problem by creating a ranking shortlist by `TITLE` (remember we dropped the `TITLE` field of our item metadata before uploading to Personalize!)

Let's first build our Christmas movie shortlist:

In [20]:
# (Of course Die Hard is a Christmas movie!)
shortlist_movies_df = items_df[items_df["TITLE"].str.contains(r"(?:Christmas|Die Hard \(1988\))")]
print(f"Found {len(shortlist_movies_df)} matching movies. Sample:")
shortlist_movies_df.head()

Found 51 matching movies. Sample:


Unnamed: 0_level_0,GENRES,YEAR,TITLE
ITEM_ID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
551,Animation|Children|Fantasy|Musical,1993,"Nightmare Before Christmas, The (1993)"
1036,Action|Crime|Thriller,1988,Die Hard (1988)
1099,Children|Drama|Fantasy,1938,"Christmas Carol, A (1938)"
2083,Children|Comedy|Musical,1992,"Muppet Christmas Carol, The (1992)"
2339,Comedy|Romance,1998,I'll Be Home For Christmas (1998)


Now we're all set to generate our personalized seasonal carousels: Even applying additional filtering criteria, if we want.

Let's explore the holiday picks for a set of example users, **filtering out any they might have reviewed before**:

In [None]:
rerank_campaign_arn = get_campaign_arn_by_name("personalize-movielens-rerank")
unwatched_filter_arn = get_filter_arn_by_name("unwatched")

# Convert the movie shortlist to just a list of (up to 500) ITEM_ID strings:
shortlist_item_ids = shortlist_movies_df.index.astype(str).to_list()[:500]

rerank_user_recs = {}

for user_id in user_ids:
    user_recs = recs_to_dataframe(
        personalize_runtime.get_personalized_ranking(
            campaignArn=rerank_campaign_arn,
            userId=str(user_id),
            inputList=shortlist_item_ids,
            filterArn=unwatched_filter_arn,
        )["personalizedRanking"]
    )
    rerank_user_recs[f"User {user_id}"] = (
        # Need to pad the results to n_recs in case some users might return fewer:
        user_recs["TITLE"].to_list() + (n_recs * [None])
    )[:n_recs]

rerank_user_recs = pd.DataFrame(rerank_user_recs)
rerank_user_recs

...and so we can rank arbitrary collections of items - even if there's no nice way to express those collections as filter rules!

## Batch Recommendations

Although not the starting point for most projects, there are many cases where you may want to build a bulk dataset of exported recommendations.

Here we'll give a quick walkthrough of the process for the User-Personalization recipe via the Python SDK; although of course it's also possible:

- ...Through the Amazon Personalize console UI (see the *Batch inference jobs* tab of the sidebar within your dataset group)
- ...For other recipe types as well (although the output format will vary a little)

You can find more information in the [Getting Batch Recommendations](https://docs.aws.amazon.com/personalize/latest/dg/recommendations-batch.html) section of the developer guide.

### Building the input file

To use the batch inference feature, you specify the inputs that you'd like to generate recommendations for up-front. Since the input fields differ between different **recipe types**, the exact format of the input file will be different too.

For our standard user personalization use-case, we'll need a [JSON-Lines](https://jsonlines.org/) file specifying just the `USER_ID` for each request, something like this:

```json
{"userId": "4638"}
{"userId": "663"}
{"userId": "3384"}
```

The cell below will again select a few candidate users and create an input file here on the notebook:

In [23]:
batch_input_filename = "batch_up_input.json"
batch_input_path = f"{data_dir}/{batch_input_filename}"

with open(batch_input_path, "w") as f:
    for user_id in range(1, 50):
        f.write(json.dumps({ "userId": str(user_id) }) + "\n")

print(f"Written input to {batch_input_path}")

Written input to poc_data/batch_up_input.json


(You can open the above file in the notebook to check the format is as expected)

As usual when working with Personalize, we'll need to upload that input to an Amazon S3 bucket - we'll use the same one as created earlier:

In [24]:
# Upload files to S3
boto3.resource("s3").Bucket(bucket_name).Object(batch_input_path).upload_file(batch_input_path)
batch_input_s3uri = f"s3://{bucket_name}/{batch_input_path}"
print(f"Uploaded:\n{batch_input_s3uri}")

Uploaded:
s3://090247010259-ap-southeast-1-personalizepocvod1/poc_data/batch_up_input.json


### Running the Job

With our input file prepared, the other parameters required to create a batch inference job are not so different from what we've used already so far. One major difference is that we supply a **solution version, not a campaign**: Because we don't need to have deployed our model to a real-time endpoint, to create batch recommendations with it:

In [25]:
# We'll need a unique job name:
batch_job_name = f"personalize-movielens-up-batch-{str(round(time.time()*1000))}"

create_batch_job_resp = personalize.create_batch_inference_job(
    # Point to our trained solution version (model):
    solutionVersionArn=up_solution_version_arn,
    jobName=batch_job_name,
    # An IAM role authorizing Personalize to access the S3 source & target:
    roleArn=personalize_role_arn,
    # Input and output data locations in S3:
    jobInput={ "s3DataSource": { "path": batch_input_s3uri } },
    jobOutput = { "s3DataDestination": {
        "path": f"s3://{export_bucket_name}/batch-results/{batch_job_name}/",
    } }
)

batch_job_arn = create_batch_job_resp["batchInferenceJobArn"]
%store batch_job_arn
create_batch_job_resp

Stored 'batch_job_arn' (str)


{'batchInferenceJobArn': 'arn:aws:personalize:ap-southeast-1:090247010259:batch-inference-job/personalize-movielens-up-batch-1616507531727',
 'ResponseMetadata': {'RequestId': '47b5d1a2-81a8-4387-931b-36f4e3f68812',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/x-amz-json-1.1',
   'date': 'Tue, 23 Mar 2021 13:52:12 GMT',
   'x-amzn-requestid': '47b5d1a2-81a8-4387-931b-36f4e3f68812',
   'content-length': '139',
   'connection': 'keep-alive'},
  'RetryAttempts': 0}}

...And just like you might expect from our experience with dataset import jobs - this process is kicked off **in the background**

> ⏰ This batch inference job can take around 30 minutes to complete, and as we previously saw with dataset import jobs - that time is typically dominated by infrastructure provisioning and setup overheads for this small sample dataset. More typical bulk processing use-cases will see much more efficiency!

The cell below will set up a polling loop to wait for the batch job to complete:

In [26]:
def is_batch_job_finished(desc):
    status = desc["batchInferenceJob"]["status"]
    if status == "ACTIVE":
        return True
    elif "FAILED" in status:
        raise ValueError(f"Batch job failed!\n{desc}")

util.progress.polling_spinner(
    fn_poll_result=lambda: personalize.describe_batch_inference_job(
        batchInferenceJobArn=batch_job_arn,
    ),
    fn_is_finished=is_batch_job_finished,
    fn_stringify_result=lambda d: d["batchInferenceJob"]["status"],
    poll_secs=30,
    timeout_secs=60*60,  # Max 1 hour
)
print("Batch inference job complete")

Initial status: CREATE PENDING
| Status: CREATE PENDING [Since: relativedelta(seconds=+29)]
| Status: CREATE IN_PROGRESS [Since: relativedelta(minutes=+24, seconds=+33)]
/ Status: ACTIVE [Since: relativedelta()]                                    
Batch inference job complete


### Using the Results

Like the input, the format of our outputs will differ a little between recipe types as described in the documentation. For this example in user personalization, we can expect to see JSON-Lines file(s) with a structure something like the below:

```json
{"input":{"userId":"4638"}, "output": {"recommendedItems": ["296", "1", "260", "318"]}}
{"input":{"userId":"663"}, "output": {"recommendedItems": ["1393", "3793", "2701", "3826"]}}
{"input":{"userId":"3384"}, "output": {"recommendedItems": ["8368", "5989", "40815", "48780"]}}
```

The `output` keys here correspond quite closely to the structure of real-time API responses. We can download our files from Amazon S3 and inspect the structure to confirm this:

In [27]:
# Recover the output S3 URI from the job description:
batch_job_desc = personalize.describe_batch_inference_job(
    batchInferenceJobArn=batch_job_arn,
)["batchInferenceJob"]

batch_output_s3uri = batch_job_desc["jobOutput"]["s3DataDestination"]["path"]

# Use the job name to build a local folder to store the output:
batch_output_path = f"{data_dir}/batch-results/{batch_job_desc['jobName']}"

# Download the outputs from S3 to local folder:
!aws s3 sync $batch_output_s3uri $batch_output_path
print("\nDownload finished!")

for filename in filter(lambda f: ".json" in f, os.listdir(batch_output_path)):
    print(f"\n>\tSAMPLE of {filename}:")
    with open(os.path.join(batch_output_path, filename), "r") as f:
        lines = f.readlines()
        print("".join(lines[:3]))
        if len(lines) > 3:
            print("\t...")

download: s3://090247010259-ap-southeast-1-personalizepocvod1/batch-results/personalize-movielens-up-batch-1616507531727/_CHECK to poc_data/batch-results/personalize-movielens-up-batch-1616507531727/_CHECK
download: s3://090247010259-ap-southeast-1-personalizepocvod1/batch-results/personalize-movielens-up-batch-1616507531727/batch_up_input.json.out to poc_data/batch-results/personalize-movielens-up-batch-1616507531727/batch_up_input.json.out

Download finished!

>	SAMPLE of batch_up_input.json.out:
{"input":{"userId":"1"},"output":{"recommendedItems":["2000","293","3448","2028","2987","1393","1213","1394","1073","1610","21","6","587","1097","586","1291","1222","2918","1234","1270","1517","1249","1580","357","2023"],"scores":[0.0068093,0.006102,0.0051529,0.0048267,0.0043085,0.0040608,0.0038899,0.0037548,0.0036348,0.0035253,0.0035051,0.0034284,0.0031454,0.0031373,0.0030739,0.0029456,0.0028729,0.0028419,0.0027518,0.002666,0.0026518,0.0026348,0.0025923,0.0025572,0.002519]},"error":null}
{"

...and then using the recommendations is simply a case of reading the file line-by-line and processing the item IDs for each user however your use case requires.

## Wrap up

With that you now have a fully working collection of models to tackle various recommendation and personalization scenarios, as well as the skills to manipulate data to better integrate with the service.

You'll want to make sure that you clean up all of the resources deployed during this PoC, to avoid potential ongoing charges (particularly for deployed infrastructure such as campaigns). We have provided a separate notebook which shows you how to identify and delete the resources in [06_Clean_Up_Resources.ipynb](06_Clean_Up_Resources.ipynb)