# 3. Deploying Campaigns and Interacting with Them

At this point, there are solutions and at least one version for each that has been created. Once they are deployed, it is possible to get recommendations from them and a feel for their overall behavior.

This notebook starts off by deploying each of the solution versions from the previous notebook into individual campaigns, and then once they are active, there are resources for querying the recommendations and then helper functions to digest the output into something a bit more human-readable. 

As you are working through examples with your customers, you can modify the helper functions to fit the structure of their data input files to keep the additional rendering working.

----

## Initial Setup

To get started, once again, imports, loading previous values, and loading the SDK.

In [1]:
import boto3
from time import sleep
import subprocess
import pandas as pd
import json
import time
import pprint
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
from datetime import datetime
import uuid

In [2]:
%store -r

In [3]:
# Setup and Config
# Recommendations from Event data
personalize = boto3.client('personalize')
personalize_runtime = boto3.client('personalize-runtime')

# Establish a connection to Personalize's Event Streaming
personalize_events = boto3.client(service_name='personalize-events')

----

## Creating Campaigns

A campaign is a hosted solution version, pricing is done by estimating throughput capacity (requests from users for personalization per second):
- This service, like many within AWS, will automatically scale based on demand, but if latency is critical, you may want to provision ahead to the greater demand. 
- Given this is purely a POC and a demo, all capacity limits are set to 1. 
- The code below will create the campaigns.

#### HRNN

In [4]:
hrnn_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-hrnn"+str(uuid.uuid4())[:5],
    solutionVersionArn = hrnn_solution_version_arn,
    minProvisionedTPS = 1
)

hrnn_campaign_arn = hrnn_create_campaign_response['campaignArn']

#### SIMS

In [5]:
sims_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-SIMS"+str(uuid.uuid4())[:5],
    solutionVersionArn = sims_solution_version_arn,
    minProvisionedTPS = 1
)

sims_campaign_arn = sims_create_campaign_response['campaignArn']

#### Personalized Ranking

In [6]:
rerank_create_campaign_response = personalize.create_campaign(
    name = "personalize-poc-rerank"+str(uuid.uuid4())[:5],
    solutionVersionArn = rerank_solution_version_arn,
    minProvisionedTPS = 1
)

rerank_campaign_arn = rerank_create_campaign_response['campaignArn']

This process should take no more than 15 minutes to complete for all your campaigns.

In [7]:
import threading

def threading_target(name, campaign_arn):
    while True: 
        campaign_status = personalize.describe_campaign(
            campaignArn=campaign_arn
        )["campaign"]["status"]
        print(name, campaign_status)
        if campaign_status != 'ACTIVE' and campaign_status != 'CREATE_FAILED':
            time.sleep(30)
        else:
            break
        
campaign_arns = {
    "rerank": rerank_campaign_arn,
    "SIMS": sims_campaign_arn,
    "HRNN": hrnn_campaign_arn
}        
threads = list()
        
for key, campaign_arn in campaign_arns.items():
    thread = threading.Thread(target=threading_target, args=(key, campaign_arn))
    threads.append(thread)
    thread.start()
    
for thread in threads: 
    thread.join()
    
print("All threads finished")    

rerank CREATE PENDING
HRNN CREATE PENDING
SIMS CREATE PENDING
HRNN CREATE IN_PROGRESS
rerankSIMS  CREATE IN_PROGRESS
CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
SIMS CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS
HRNN CREATE IN_PROGRESS
rerank CREATE IN_PROGRESS


----

## Interacting with Campaigns

Now that they are all deployed and active, we can start to get recommendations via the API call. Each of these behaves in slightly different ways as they serve a different use case.  The order will be switched up a bit to deal with the possible complexities in ascending order(simplest first).

That said, you may need a few supporting functions to help make sense of the results from the service. Personalize returns only an `item_id.` This is great for keeping data compaact, but it means you need to query the real DB or some lookup table to get a human-readable result for the notebooks. The first few cells are going to create that for this particular example. 

In [8]:
# Create a dataframe for the items by reading in the correct source CSV.
items_df = pd.read_csv(data_dir + '/artists.dat', delimiter='\t', index_col=0)
# Render some sample data
items_df.head(5)

Unnamed: 0_level_0,name,url,pictureURL
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
1,MALICE MIZER,http://www.last.fm/music/MALICE+MIZER,http://userserve-ak.last.fm/serve/252/10808.jpg
2,Diary of Dreams,http://www.last.fm/music/Diary+of+Dreams,http://userserve-ak.last.fm/serve/252/3052066.jpg
3,Carpathian Forest,http://www.last.fm/music/Carpathian+Forest,http://userserve-ak.last.fm/serve/252/40222717...
4,Moi dix Mois,http://www.last.fm/music/Moi+dix+Mois,http://userserve-ak.last.fm/serve/252/54697835...
5,Bella Morte,http://www.last.fm/music/Bella+Morte,http://userserve-ak.last.fm/serve/252/14789013...


By defining the ID column as the index column, it is trivial to return an artist by just doing this:

In [9]:
item_id_example = 987
artist = items_df.loc[item_id_example]['name']
print(artist)

Earth, Wind & Fire


That isn't terrible but would get messy to repeat everywhere in our code, so the function below will clean that up.

In [10]:
def get_artist_by_id(artist_id, artist_df=items_df):
    """
    This takes in an artist_id from Personalize so it will be a string,
    converts it to an int, and then does a lookup in a default or specified
    dataframe.
    
    A really broad try/except clause was added in case of anything going wrong.
    
    Feel free to add more debugging or filtering here to improve results if
    you hit an error.
    """
    try:
        return artist_df.loc[int(artist_id)]['name']
    except:
        return "Error obtaining artist"

To test that out, a few simple values and to see what happens with errors:

In [11]:
# A known good id
print(get_artist_by_id(artist_id="987"))
# A bad type of value
print(get_artist_by_id(artist_id="987.9393939"))
# Really bad values
print(get_artist_by_id(artist_id="Steve"))

Earth, Wind & Fire
Error obtaining artist
Error obtaining artist


Great now we have a way of rendering results, now we'd like to select 5 random artists from our dataframe and determine their SIMS results. 

In [12]:
samples = items_df.sample(5)
samples

Unnamed: 0_level_0,name,url,pictureURL
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
9277,Across The Universe Soundtrack,http://www.last.fm/music/Across+The+Universe+S...,http://userserve-ak.last.fm/serve/252/59828881...
10453,Magnum,http://www.last.fm/music/Magnum,http://userserve-ak.last.fm/serve/252/497179.jpg
12315,Zoë Keating,http://www.last.fm/music/Zo%C3%AB+Keating,http://userserve-ak.last.fm/serve/252/2129283.jpg
18191,sayCeT,http://www.last.fm/music/sayCeT,http://userserve-ak.last.fm/serve/252/61489987...
5776,House Boulevard Feat. Samara,http://www.last.fm/music/House+Boulevard+Feat....,http://userserve-ak.last.fm/serve/252/8698889.jpg


----

## SIMS

SIMS requires just an item, and it will return items that are behaved within similar ways by your users. In this particular case, the item is an artist. The cells below will handle getting recommendations from SIMS and rendering the results.

Now go forth and get some recommendations for just the first known item ( Earth Wind and Fire )

In [13]:
get_recommendations_response = personalize_runtime.get_recommendations(
    campaignArn = sims_campaign_arn,
    itemId = str(987),
)

In [14]:
item_list = get_recommendations_response['itemList']

In [15]:
for item in item_list:
    print(get_artist_by_id(artist_id=item['itemId']))

The Byrds
Johnny Cash
Lacrimas Profundere
Neil Young
Jethro Tull
George Harrison
Bob Dylan
Amorphis
Motörhead
Bruce Springsteen
John Lennon
The Who
The Rolling Stones


This is an OK list but it would be really cool to see how the collection of artists render in a nice Dataframe, the code below will do just that.

In [16]:
# Update DF rendering
pd.set_option('display.max_rows', 30)

def get_new_recommendations_df(recommendations_df, artist_ID):
    # Get the artist name
    artist_name = get_artist_by_id(artist_ID)
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn = sims_campaign_arn,
        itemId = str(artist_ID),
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        artist = get_artist_by_id(item['itemId'])
        recommendation_list.append(artist)
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [artist_name])
    # Add this dataframe to the old one
    #recommendations_df = recommendations_df.join(new_rec_DF)
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

sims_recommendations_df = pd.DataFrame()

artists = samples.index.tolist()


for artist in artists:
    sims_recommendations_df = get_new_recommendations_df(sims_recommendations_df, artist)

sims_recommendations_df

Unnamed: 0,Across The Universe Soundtrack,Magnum,Zoë Keating,sayCeT,House Boulevard Feat. Samara
0,Britney Spears,Britney Spears,Britney Spears,Britney Spears,Britney Spears
1,Depeche Mode,Depeche Mode,Depeche Mode,Depeche Mode,Depeche Mode
2,Lady Gaga,Lady Gaga,Lady Gaga,Lady Gaga,Lady Gaga
3,Madonna,Madonna,Madonna,Madonna,Madonna
4,Christina Aguilera,Christina Aguilera,Christina Aguilera,Christina Aguilera,Christina Aguilera
5,Muse,Muse,Muse,Muse,Muse
6,The Beatles,The Beatles,The Beatles,The Beatles,The Beatles
7,Rihanna,Rihanna,Rihanna,Rihanna,Rihanna
8,Radiohead,Radiohead,Radiohead,Radiohead,Radiohead
9,Coldplay,Coldplay,Coldplay,Coldplay,Coldplay


You may notice that many of the items look the same. Hopefully, not all of them do. This is an excellent time to think about leveraging the popularity of discounting hyperparameter in your next revision. That would allow for a bit more nuance in the results. This parameter and its behavior will be unique to every dataset you encounter and the goals of the business. Iterate over that until you find a mix that achieves your objectives.

The remaining campaigns rely on having a sampling of users as well so we will parse for their data and select 3 at random below before moving on.

In [17]:
users_df = pd.read_csv(data_dir + '/user_artists.dat', delimiter='\t', index_col=0)
# Render some sample data
users_df.head(5)

Unnamed: 0_level_0,artistID,weight
userID,Unnamed: 1_level_1,Unnamed: 2_level_1
2,51,13883
2,52,11690
2,53,11351
2,54,10300
2,55,8983


In [18]:
users = users_df.sample(3).index.tolist()
users

[336, 1035, 1235]

----

### HRNN

HRNN is one of the more advanced algorithms provided by Amazon Personalize. It supports personalization of the items for a specific user based on their past behavior and can intake real-time events to alter recommendations for a user without retraining. 

First, the cells below will render the recommendations for our 3 random users from above. After that, we will explore real-time interactions before moving on to Personalized Ranking.

#### API Call Results

In [19]:
# Update DF rendering
pd.set_option('display.max_rows', 30)

def get_new_recommendations_df_users(recommendations_df, user_id):
    # Get the artist name
    #artist_name = get_artist_by_id(artist_ID)
    # Get the recommendations
    get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn = hrnn_campaign_arn,
        userId = str(user_id),
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        artist = get_artist_by_id(item['itemId'])
        recommendation_list.append(artist)
    #print(recommendation_list)
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [user_id])
    # Add this dataframe to the old one
    #recommendations_df = recommendations_df.join(new_rec_DF)
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

recommendations_df_users = pd.DataFrame()

users = users_df.sample(3).index.tolist()
print(users)

for user in users:
    recommendations_df_users = get_new_recommendations_df_users(recommendations_df_users, user)

recommendations_df_users

[1896, 436, 2080]


Unnamed: 0,1896,436,2080
0,Cachorro Grande,Kylie Minogue,Depeche Mode
1,China Crisis,Cold War Kids,Paramore
2,Leilah Moreno,Michael Jackson,The Beatles
3,Astrud Gilberto,Down,Björk
4,Belleruche,The Sugarcubes,Radiohead
5,Symphony X,Brooke Fraser,Coldplay
6,Andrew Sixty,Bag Raiders,The Killers
7,Cold War Kids,Ania,Julian Casablancas
8,Bebel Gilberto,Tuxedomoon,Metallica
9,YUI,Throbbing Gristle,Nirvana


Here we clearly see that all of their recommendations are different, if you were to need a cache for these results you could start by running the API calls through all your users and storing the results yourself or use a batch export which will be covered after Personalized Ranking. 

The next topic here is real-time events. Personalize has the ability to listen to events from your application in order to update what your users will be shown. This is especially useful in media workloads like video on demand where a customers intent may be to sit down and watch a show with their children or a more serious program later.

Additionally the events that are recorded via this system are also stored until a delete call from you is issued and they are used as historical data alongslide the other interaction data you provided when you train your next models.

#### Real Time Events

Start by creating an event tracker that is attached to the campaign:

In [20]:
response = personalize.create_event_tracker(
    name='ArtistTracker',
    datasetGroupArn=dataset_group_arn
)
print(response['eventTrackerArn'])
print(response['trackingId'])
TRACKING_ID = response['trackingId']
event_tracker_arn = response['eventTrackerArn']


arn:aws:personalize:us-east-1:822894322603:event-tracker/6b08c422
56721ada-0d7d-488a-9ce2-14eea94e7770


The lines below provide a code sample that simulates a user interacting with a particular item, you will then get recommendations that differ from those when you started.


In [21]:
session_dict = {}

def send_artist_click(USER_ID, ITEM_ID):
    """
    Simulates a click as an envent
    to send an event to Amazon Personalize's Event Tracker
    """
    # Configure Session
    try:
        session_ID = session_dict[str(USER_ID)]
    except:
        session_dict[str(USER_ID)] = str(uuid.uuid1())
        session_ID = session_dict[str(USER_ID)]
        
    # Configure Properties:
    event = {
    "itemId": str(ITEM_ID),
    }
    event_json = json.dumps(event)
        
    # Make Call
    personalize_events.put_events(
    trackingId = TRACKING_ID,
    userId= str(USER_ID),
    sessionId = session_ID,
    eventList = [{
        'sentAt': int(time.time()),
        'eventType': 'EVENT_TYPE',
        'properties': event_json
        }]
    )

def get_new_recommendations_df_users_real_time(recommendations_df, user_id, item_id):
    # Get the artist name (header of column)
    artist_name = get_artist_by_id(item_id)
    # Interact with the artist
    send_artist_click(USER_ID=user_id, ITEM_ID=item_id)
    # Get the recommendations (note you should have a base recommendation DF created before)
    get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn = hrnn_campaign_arn,
        userId = str(user_id),
    )
    # Build a new dataframe of recommendations
    item_list = get_recommendations_response['itemList']
    recommendation_list = []
    for item in item_list:
        artist = get_artist_by_id(item['itemId'])
        recommendation_list.append(artist)
    #print(recommendation_list)
    new_rec_DF = pd.DataFrame(recommendation_list, columns = [artist_name])
    # Add this dataframe to the old one
    #recommendations_df = recommendations_df.join(new_rec_DF)
    recommendations_df = pd.concat([recommendations_df, new_rec_DF], axis=1)
    return recommendations_df

Those are just supporting functions, a simple dataframe for just the user's non session based recommend is needed before calling them:

In [22]:
# First pick a user:
user_id = users_df.sample(1).index.tolist()[0]

In [23]:

get_recommendations_response = personalize_runtime.get_recommendations(
        campaignArn = hrnn_campaign_arn,
        userId = str(user_id),
    )
# Build a new dataframe of recommendations
item_list = get_recommendations_response['itemList']
recommendation_list = []
for item in item_list:
    artist = get_artist_by_id(item['itemId'])
    recommendation_list.append(artist)
user_recommendations_df = pd.DataFrame(recommendation_list, columns = [user_id])
user_recommendations_df

Unnamed: 0,732
0,Eminem
1,Depeche Mode
2,Lady Gaga
3,Wanessa
4,Björk
5,Christina Aguilera
6,Whitney Houston
7,Radiohead
8,Ace of Base
9,Paramore


In [24]:
# Next generate 3 random artists to interact with:
artists = items_df.sample(3).index.tolist()

In [25]:
# Note this will take about 15 seconds to complete due to the sleeps.
for artist in artists:
    user_recommendations_df = get_new_recommendations_df_users_real_time(user_recommendations_df, user_id, artist)
    time.sleep(5)
user_recommendations_df

Unnamed: 0,732,Milton Nascimento & Lô Borges,Tyler Ward,Products of Monkey Love
0,Eminem,Eminem,Eminem,Whitney Houston
1,Depeche Mode,Depeche Mode,Depeche Mode,Skunk Anansie
2,Lady Gaga,Lady Gaga,Lady Gaga,Leilah Moreno
3,Wanessa,Wanessa,Wanessa,Lady Gaga
4,Björk,Björk,Björk,Christina Aguilera
5,Christina Aguilera,Christina Aguilera,Christina Aguilera,DJ BoBo
6,Whitney Houston,Whitney Houston,Whitney Houston,Ace of Base
7,Radiohead,Radiohead,Radiohead,J.K.
8,Ace of Base,Ace of Base,Ace of Base,Andrew Sixty
9,Paramore,Paramore,Paramore,Milk Inc.


In the cell above the first column after the index is the user's default recommendations from HRNN, and each column after has a header of the artist that they interacted with via a real-time event, and the following recommendations. 

The behavior may not shift very much after the second interaction; this is due to the relatively limited nature of this dataset. If you wanted to better understand this, simulating clicking random artists of random genres would have a more pronounced impact.

Time for the last campaign.

----

### Personalized Ranking

Again the core use case for this is to take a collection of items and to render them in priority or probable order of interest for a user. To demonstrate this, we will need a random user and a random collection of 25 items.

> It can be combined with search engines to provide sophisticated personalization.

In [26]:
rerank_user = users_df.sample(1).index.tolist()[0]
rerank_items = items_df.sample(25).index.tolist()

Now build a nice dataframe that shows the input data:

In [27]:
rerank_list = []
for item in rerank_items:
    artist = get_artist_by_id(item)
    rerank_list.append(artist)
rerank_df = pd.DataFrame(rerank_list, columns = [rerank_user])
rerank_df

Unnamed: 0,119
0,Toni Braxton
1,Francisca Valenzuela
2,Raffaella Carrà
3,Universum
4,Wa Wa Nee
5,Clap Your Hands Say Yeah
6,Ukurralle
7,L7
8,Janio Lora
9,Unholy Matrimony


Now make the personalized-ranking API call:

In [28]:
# Convert user to string:
user_id = str(rerank_user)
rerank_item_list = []
for item in rerank_items:
    rerank_item_list.append(str(item))

In [29]:
get_recommendations_response_rerank = personalize_runtime.get_personalized_ranking(
        campaignArn = rerank_campaign_arn,
        userId = user_id,
        inputList = rerank_item_list
)

In [30]:
get_recommendations_response_rerank

{'ResponseMetadata': {'RequestId': '1fb7c3d4-554d-41bc-b1e8-116fba02cac6',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'content-type': 'application/json',
   'date': 'Wed, 11 Mar 2020 09:44:29 GMT',
   'x-amzn-requestid': '1fb7c3d4-554d-41bc-b1e8-116fba02cac6',
   'content-length': '504',
   'connection': 'keep-alive'},
  'RetryAttempts': 0},
 'personalizedRanking': [{'itemId': '2006'},
  {'itemId': '1817'},
  {'itemId': '241'},
  {'itemId': '236'},
  {'itemId': '7991'},
  {'itemId': '16188'},
  {'itemId': '6820'},
  {'itemId': '5020'},
  {'itemId': '16030'},
  {'itemId': '4321'},
  {'itemId': '5159'},
  {'itemId': '4304'},
  {'itemId': '2308'},
  {'itemId': '5240'},
  {'itemId': '12526'},
  {'itemId': '9701'},
  {'itemId': '15677'},
  {'itemId': '8996'},
  {'itemId': '2776'},
  {'itemId': '14268'},
  {'itemId': '6826'},
  {'itemId': '8520'},
  {'itemId': '18414'},
  {'itemId': '2808'},
  {'itemId': '16439'}]}

The only remaining step is to add them to the dataframe.

In [31]:
ranked_list = []
item_list = get_recommendations_response_rerank['personalizedRanking']
for item in item_list:
    artist = get_artist_by_id(item['itemId'])
    ranked_list.append(artist)
ranked_df = pd.DataFrame(ranked_list, columns = ['Re-Ranked'])
rerank_df = pd.concat([rerank_df, ranked_df], axis=1)
rerank_df

Unnamed: 0,119,Re-Ranked
0,Toni Braxton,Pain of Salvation
1,Francisca Valenzuela,L7
2,Raffaella Carrà,Toni Braxton
3,Universum,Clap Your Hands Say Yeah
4,Wa Wa Nee,DJ floorclearer
5,Clap Your Hands Say Yeah,Ленина Пакет
6,Ukurralle,Universum
7,L7,Queen + Paul Rodgers
8,Janio Lora,Hüsnü Arkan
9,Unholy Matrimony,Galneryus


You can see above how each entry was re-ordered based on the model's understanding of the user. This is a prevalent task when you have a collection of items to surface a user, a list of promotions, for example, or if you are filtering on a category and want to show the most likely useful items.

----

## Batch Recommendations

There are many cases where you may want to have a larger dataset of exported recommendations from caching to just digging into the results to learn more. Recently Amazon Personalize launched Batch Recommendations as a way to export a collection of recommendations to S3. For simplicity's sake in this example, we will walk through how to do this for the HRNN solution.

Full info can be found [here](https://docs.aws.amazon.com/personalize/latest/dg/getting-recommendations.html#recommendations-batch)

This feature applies to all algorithms, though the output will vary, again see the docs for a full breakdown.

A simple implementation looks like this:

```python
import boto3

personalize_rec = boto3.client(service_name='personalize')

personalize_rec.create_batch_inference_job (
    solutionVersionArn = "Solution version ARN",
    jobName = "Batch job name",
    roleArn = "IAM role ARN",
    jobInput = 
       {"s3DataSource": {"path": "S3 input path"}},
    jobOutput = 
       {"s3DataDestination": {"path":"S3 output path"}}
)
```

The SDK import, the solution version arn, and role arns have all been determined. This just leaves an input, an output, and a job name to be defined.

Starting with the input for HRNN, it looks like:


```JSON
{"userId": "4638"}
{"userId": "663"}
{"userId": "3384"}
```

This should yield something like this as output:

```JSON
{"input":{"userId":"4638"}, "output": {"recommendedItems": ["296", "1", "260", "318"]}}
{"input":{"userId":"663"}, "output": {"recommendedItems": ["1393", "3793", "2701", "3826"]}}
{"input":{"userId":"3384"}, "output": {"recommendedItems": ["8368", "5989", "40815", "48780"]}}
```

This file is sort of JSON: it is JSON if you parse it a line at a time, so more work later to digest the results when they come back.

##### Building the Input File

When you are using the batch feature, you specify the users that you'd like to receive recommendations for when the job has completed, that is done with the schema shown above. The cell below will again select a few random users and will then build the file and save it to disk.

From there, you will upload it to S3 to use in the API call later.

In [32]:
# Get the user list
batch_users = users_df.sample(3).index.tolist()

# Write the file to disk
json_input_filename = "json_input.json"
with open(data_dir + "/" + json_input_filename, 'w') as json_input:
    for user_id in batch_users:
        json_input.write('{"userId": "' + str(user_id) + '"}\n')

In [33]:
# Showcase the input file:
!cat $data_dir"/"$json_input_filename

{"userId": "371"}
{"userId": "928"}
{"userId": "1749"}


Upload the file to S3 and save the path as a variable for later.

In [34]:
# Upload files to S3
boto3.Session().resource('s3').Bucket(bucket_name).Object(json_input_filename).upload_file(data_dir+"/"+json_input_filename)
s3_input_path = "s3://" + bucket_name + "/" + json_input_filename
print(s3_input_path)

s3://822894322603personalizepoc281089fc-eae9-49ab-8475-a634094b1420/json_input.json


Define the ouput path for the API call:

In [35]:
# Define the output path
s3_output_path = "s3://" + bucket_name + "/"
print(s3_output_path)

s3://822894322603personalizepoc281089fc-eae9-49ab-8475-a634094b1420/


Now just make the call to kick off the batch export process.

In [37]:
batchInferenceJobArn = personalize.create_batch_inference_job (
    solutionVersionArn = hrnn_solution_version_arn,
    jobName = "POC-Batch-Inference-Job-HRNN"+str(uuid.uuid4())[:5],
    roleArn = role_arn,
    jobInput = 
     {"s3DataSource": {"path": s3_input_path}},
    jobOutput = 
     {"s3DataDestination":{"path": s3_output_path}}
)
batchInferenceJobArn = batchInferenceJobArn['batchInferenceJobArn']

Wait for the job to complete here, this process may take a few minutes to complete, this is due to the creation of infrastructure to perform the task. In bulk, it would be quite quick to export. However, we are wasting the potential here by only exporting a handful of items, this is just to show the process.

In [38]:
current_time = datetime.now()
print("Export Started on: ", current_time.strftime("%I:%M:%S %p"))

max_time = time.time() + 3*60*60 # 3 hours
while time.time() < max_time:
    describe_dataset_inference_job_response = personalize.describe_batch_inference_job(
        batchInferenceJobArn = batchInferenceJobArn
    )
    status = describe_dataset_inference_job_response["batchInferenceJob"]['status']
    print("DatasetInferenceJob: {}".format(status))
    
    if status == "ACTIVE" or status == "CREATE FAILED":
        break
    else:    
        time.sleep(60)
    
current_time = datetime.now()
print("Export Completed on: ", current_time.strftime("%I:%M:%S %p"))

Export Started on:  11:51:13 AM
DatasetInferenceJob: CREATE PENDING
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInferenceJob: CREATE IN_PROGRESS
DatasetInfer

With the data successfully exported, grab the file and parse it:

In [39]:
s3 = boto3.client('s3')
export_name = json_input_filename + ".out"
s3.download_file(bucket_name, export_name, data_dir+"/"+export_name)

# Update DF rendering
pd.set_option('display.max_rows', 30)
with open(data_dir+"/"+export_name) as json_file:
    # Get the first line and parse it
    line = json.loads(json_file.readline())
    # Do the same for the other lines
    while line:
        # extract the user ID 
        col_header = "User: " + line['input']['userId']
        # Create a list for all the artists
        recommendation_list = []
        # Add all the entries
        for item in line['output']['recommendedItems']:
            artist = get_artist_by_id(item)
            recommendation_list.append(artist)
        if 'bulk_recommendations_df' in locals():
            new_rec_DF = pd.DataFrame(recommendation_list, columns = [col_header])
            bulk_recommendations_df = bulk_recommendations_df.join(new_rec_DF)
        else:
            bulk_recommendations_df = pd.DataFrame(recommendation_list, columns=[col_header])
        try:
            line = json.loads(json_file.readline())
        except:
            line = None
bulk_recommendations_df

Unnamed: 0,User: 371,User: 928,User: 1749
0,Depeche Mode,Coldplay,Christina Aguilera
1,Coldplay,Paramore,OneRepublic
2,Eminem,Lady Gaga,Adam Lambert
3,Paramore,Depeche Mode,Regina Spektor
4,Christina Aguilera,The Beatles,Robert Pattinson
5,Björk,Björk,Travis
6,Radiohead,Britney Spears,Calvin Harris
7,Pink Floyd,Christina Aguilera,Hole
8,The Beatles,Madonna,Mika
9,Queen,Radiohead,Depeche Mode


----

## Wrap Up

With that, you now have a fully working collection of models to tackle various recommendations and personalization scenarios as well as the skills to manipulate customer data to better integrate with the service and a knowledge of how to do all this over APIs and leveraging open source data science tools.

Use the notebooks as a guide to getting started with your customers for POCs and as you find missing components or discover new approaches, cut a pull request and provide any additional helpful components that may be missing from this collection.

Good luck!