Connecting to Amazon Web Services using Boto3 is much more simpler than eariler versions of Boto

I'll list the steps to connect to Amazon Personalize using boto3 and steps to create a solution that recommends books to a user using AutoML configuration

**Download Dataset from below link (Please use TAMU email to access the dataset)**

[Dataset Link Google Drive](https://drive.google.com/open?id=1NHu_YX7bqgWSBLXvXd1ndo3jYt_Wtgpq)

### Resources

* [Boto3 Personalize](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/personalize.html)

* [Boto3 S3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/s3.html)

* [S3](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/what-is-s3.html)

* [Personalize](https://docs.aws.amazon.com/personalize/latest/dg/what-is-personalize.html)

* [Dataset Book Crossings](http://www2.informatik.uni-freiburg.de/~cziegler/BX/)





### Insights

The main insights of using boto3 for personalize are as follows -

* We need to make sure Amazon Personalize can access S3 storage. We need to ensure we add this policy to our S3 bucket

* The schema has to match the data in csv files. Let's say we have NULL as a value in our dataset for Age and in the schema we mention it as an integer. The schema wouldn't be able to match the data type

* Training takes significant amount of time depending on the size of the data when we create a solution version

* We also need to add a role to Personalize in IAM for import jobs

Dataset and things to remember

*   Only CSV files are accepted by Personalize stored in S3 bucket
*   Some fields are mandatory for the schema defined by which Personalize identifies the dataset
*   There are many fields for the dataset out of which **USER_ID**, **ITEM_ID** and **TIMESTAMP** are required.

> 1.   Users - USER_ID
2.   Items - ITEM_ID
3.   Interactions - USER_ID, ITEM_ID, TIMESTAMP

* Interactions cannot have more than 5 fields in the dataset

### Install boto using pip




In [0]:
!pip install boto3



### Setting up Boto to access AWS



*   Boto has many functions available for each service which is available in detail in the documentation
*   In general it is like an API, there are few parameters that can be passed in the request and in the response you can read the data requested



Creating an access key for an IAM user

[Making the credentials file](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)

[Boto Configuration](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/quickstart.html)



In [0]:
!mkdir ~/.aws && cp 'credentials' ~/.aws/credentials

### Using Boto3 to Upload data to S3 bucket

In [0]:
import boto3
import requests

client_s3 = boto3.client('s3') # access any resource by typing in the name service
s3_name = 'suryastorage17'

Creating a bucket in S3

In [0]:
response = client_s3.create_bucket(
    ACL='public-read-write',
    Bucket=s3_name
)

In [0]:
!ls -l

total 116788
-rw-r--r-- 1 root root 73443458 Feb 28 07:12 books.csv
-rw-r--r-- 1 root root      778 Feb 28 07:11 books.json
-rw-r--r-- 1 root root      115 Feb 28 07:11 credentials
-rw-r--r-- 1 root root 35281363 Feb 28 07:12 interactions.csv
-rw-r--r-- 1 root root      461 Feb 28 07:11 interactions.json
drwxr-xr-x 1 root root     4096 Feb  5 18:37 sample_data
-rw-r--r-- 1 root root 10831168 Feb 28 07:11 users.csv
-rw-r--r-- 1 root root      505 Feb 28 07:11 users.json


In [0]:
import boto3
import glob

s3 = boto3.resource('s3')
for f in glob.glob("*.csv"):
  s3.meta.client.upload_file(f, s3_name, f)

### Schema Generation

Saving Schema

In [0]:
import boto3
import os
import glob

personalize = boto3.client('personalize', region_name='us-east-1')

schema_arns = {}

for file in glob.glob("*.json"):
  with open(file) as f:
      createSchemaResponse = personalize.create_schema(
          name = file[:-6]+'Schema',
          schema = f.read()
      )

      schema_arns[file[:-6]+'Schema'] = (createSchemaResponse['schemaArn'])

print('Schems arns:', schema_arns)

Schems arns: {'bookSchema': 'arn:aws:personalize:us-east-1:522036915387:schema/bookSchema', 'userSchema': 'arn:aws:personalize:us-east-1:522036915387:schema/userSchema', 'interactionSchema': 'arn:aws:personalize:us-east-1:522036915387:schema/interactionSchema'}


In [0]:
personalize = boto3.client('personalize', region_name='us-east-1')
schema_arns = {}

with open('users.json') as f:
  createSchemaResponse = personalize.create_schema(
            name = 'userSchema',
            schema = f.read()
        )

  schema_arns['userSchema'] = (createSchemaResponse['schemaArn'])
print(schema_arns)

{'userSchema': 'arn:aws:personalize:us-east-1:522036915387:schema/userSchema'}


List Schemas in AWS

In [0]:
personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.list_schemas(
    maxResults=5
)

print(response['schemas'])

[{'name': 'bookSchema', 'schemaArn': 'arn:aws:personalize:us-east-1:522036915387:schema/bookSchema', 'creationDateTime': datetime.datetime(2020, 2, 29, 2, 49, 50, 354000, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2020, 2, 29, 2, 49, 50, 354000, tzinfo=tzlocal())}, {'name': 'interactionSchema', 'schemaArn': 'arn:aws:personalize:us-east-1:522036915387:schema/interactionSchema', 'creationDateTime': datetime.datetime(2020, 2, 29, 2, 49, 50, 424000, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2020, 2, 29, 2, 49, 50, 424000, tzinfo=tzlocal())}, {'name': 'userSchema', 'schemaArn': 'arn:aws:personalize:us-east-1:522036915387:schema/userSchema', 'creationDateTime': datetime.datetime(2020, 2, 29, 2, 49, 50, 393000, tzinfo=tzlocal()), 'lastUpdatedDateTime': datetime.datetime(2020, 2, 29, 2, 49, 50, 393000, tzinfo=tzlocal())}]


Delete Schema in AWS

Note-

To make any changes to the schema and retain the same name, we need to delete it and create a new one 

In [0]:
personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.delete_schema(
    schemaArn='arn:aws:personalize:us-east-1:522036915387:schema/userSchema'
)

### Create and Import DatasetGroup in Personalize

Steps:


*   Create a DatasetGroup to hold all datasets
*   Create Datasets(Users, Items, Interactions)
*   Import Datasets from S3

Note-
Policy for Personalize to access S3

https://docs.aws.amazon.com/personalize/latest/dg/data-prep-upload-s3.html




In [0]:
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.create_dataset_group(name = 'BooksDatasetGroup')
dsg_arn = response['datasetGroupArn']

description = personalize.describe_dataset_group(datasetGroupArn = dsg_arn)['datasetGroup']

print('Name: ' + description['name'])
print('ARN: ' + description['datasetGroupArn'])
print('Status: ' + description['status'])

Name: BooksDatasetGroup
ARN: arn:aws:personalize:us-east-1:522036915387:dataset-group/BooksDatasetGroup
Status: CREATE PENDING


Wait until the DatasetGroup is active in status

In [0]:
description = personalize.describe_dataset_group(datasetGroupArn = dsg_arn)['datasetGroup']
print('Name: ' + description['name'])
print('ARN: ' + description['datasetGroupArn'])
print('Status: ' + description['status'])

Name: BooksDatasetGroup
ARN: arn:aws:personalize:us-east-1:522036915387:dataset-group/BooksDatasetGroup
Status: ACTIVE


Create Role in IAM AWS for Personalize to access S3 and copy the ARN that looks like below: **arn:aws:iam::522036915387:role/PersonalizeRole**

[Set up role in IAM for Personalize](https://docs.aws.amazon.com/personalize/latest/dg/aws-personalize-set-up-permissions.html#set-up-create-role-with-permissions)

We create and import 3 datasets

* Users
* Books(Items)
* Interactions(Users-Books)

Import Interactions Dataset

In [0]:
# Create Dataset Interactions
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.create_dataset(
    name = 'Interactions',
    schemaArn = 'arn:aws:personalize:us-east-1:522036915387:schema/interactionSchema',
    datasetGroupArn = 'arn:aws:personalize:us-east-1:522036915387:dataset-group/BooksDatasetGroup',
    datasetType = 'Interactions')

print ('Dataset Arn: ' + response['datasetArn'])

Dataset Arn: arn:aws:personalize:us-east-1:522036915387:dataset/BooksDatasetGroup/INTERACTIONS


In [0]:
# Import Dataset Interactions
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.create_dataset_import_job(
    jobName = 'ImportInteractions1',
    datasetArn = 'arn:aws:personalize:us-east-1:522036915387:dataset/BooksDatasetGroup/INTERACTIONS',
    dataSource = {'dataLocation':'s3://suryastorage17/interactions.csv'},
    roleArn = 'arn:aws:iam::522036915387:role/PersonalizeRole')

dsij_arn = response['datasetImportJobArn']

print ('Dataset Import Job arn: ' + dsij_arn)

description = personalize.describe_dataset_import_job(
    datasetImportJobArn = dsij_arn)['datasetImportJob']

print('Name: ' + description['jobName'])
print('ARN: ' + description['datasetImportJobArn'])
print('Status: ' + description['status'])

Dataset Import Job arn: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportInteractions1
Name: ImportInteractions1
ARN: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportInteractions1
Status: CREATE PENDING


In [0]:
description = personalize.describe_dataset_import_job(
    datasetImportJobArn = dsij_arn)['datasetImportJob']

print('Name: ' + description['jobName'])
print('ARN: ' + description['datasetImportJobArn'])
print('Status: ' + description['status'])

Name: ImportInteractions1
ARN: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportInteractions1
Status: ACTIVE


Import Users Dataset

In [0]:
# Create Dataset Users
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.create_dataset(
    name = 'Users',
    schemaArn = 'arn:aws:personalize:us-east-1:522036915387:schema/userSchema',
    datasetGroupArn = 'arn:aws:personalize:us-east-1:522036915387:dataset-group/BooksDatasetGroup',
    datasetType = 'Users')

print ('Dataset Arn: ' + response['datasetArn'])

Dataset Arn: arn:aws:personalize:us-east-1:522036915387:dataset/BooksDatasetGroup/USERS


In [0]:
# Import Dataset Users
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.create_dataset_import_job(
    jobName = 'ImportUsers',
    datasetArn = 'arn:aws:personalize:us-east-1:522036915387:dataset/BooksDatasetGroup/USERS',
    dataSource = {'dataLocation':'s3://suryastorage17/users.csv'},
    roleArn = 'arn:aws:iam::522036915387:role/PersonalizeRole')

dsij_arn = response['datasetImportJobArn']

print ('Dataset Import Job arn: ' + dsij_arn)

description = personalize.describe_dataset_import_job(
    datasetImportJobArn = dsij_arn)['datasetImportJob']

print('Name: ' + description['jobName'])
print('ARN: ' + description['datasetImportJobArn'])
print('Status: ' + description['status'])

Dataset Import Job arn: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportUsers
Name: ImportUsers
ARN: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportUsers
Status: CREATE PENDING


In [0]:
description = personalize.describe_dataset_import_job(
    datasetImportJobArn = dsij_arn)['datasetImportJob']

print('Name: ' + description['jobName'])
print('ARN: ' + description['datasetImportJobArn'])
print('Status: ' + description['status'])

Name: ImportUsers
ARN: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportUsers
Status: ACTIVE


Import Books(Items) Dataset

In [0]:
# Create Dataset Books(Items)
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.create_dataset(
    name = 'Items',
    schemaArn = 'arn:aws:personalize:us-east-1:522036915387:schema/bookSchema',
    datasetGroupArn = 'arn:aws:personalize:us-east-1:522036915387:dataset-group/BooksDatasetGroup',
    datasetType = 'Items')

print ('Dataset Arn: ' + response['datasetArn'])

Dataset Arn: arn:aws:personalize:us-east-1:522036915387:dataset/BooksDatasetGroup/ITEMS


In [0]:
# Import Dataset Books(Items)
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.create_dataset_import_job(
    jobName = 'ImportItems',
    datasetArn = 'arn:aws:personalize:us-east-1:522036915387:dataset/BooksDatasetGroup/ITEMS',
    dataSource = {'dataLocation':'s3://suryastorage17/books.csv'},
    roleArn = 'arn:aws:iam::522036915387:role/PersonalizeRole')

dsij_arn = response['datasetImportJobArn']

print ('Dataset Import Job arn: ' + dsij_arn)

description = personalize.describe_dataset_import_job(
    datasetImportJobArn = dsij_arn)['datasetImportJob']

print('Name: ' + description['jobName'])
print('ARN: ' + description['datasetImportJobArn'])
print('Status: ' + description['status'])

Dataset Import Job arn: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportItems
Name: ImportItems
ARN: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportItems
Status: CREATE PENDING


In [0]:
description = personalize.describe_dataset_import_job(
    datasetImportJobArn = dsij_arn)['datasetImportJob']

print('Name: ' + description['jobName'])
print('ARN: ' + description['datasetImportJobArn'])
print('Status: ' + description['status'])

Name: ImportItems
ARN: arn:aws:personalize:us-east-1:522036915387:dataset-import-job/ImportItems
Status: CREATE IN_PROGRESS


### Creating a Solution in Personalize using AutoML

Steps to create Solution

*   Create solution to describe the type of ML model to use(AutoML, manual, recipes, etc.)
*   Creating a solution version starts the training process on the datasetgroup (It takes a few hours depending on the size of the data provided)



In [0]:
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

print ('Creating solution')
response = personalize.create_solution(
    name = "BookSolution",
    datasetGroupArn = "arn:aws:personalize:us-east-1:522036915387:dataset-group/BooksDatasetGroup",
    performAutoML = True)

# Get the solution ARN.
solution_arn = response['solutionArn']
print('Solution ARN: ' + solution_arn)

Creating solution
Solution ARN: arn:aws:personalize:us-east-1:522036915387:solution/BookSolution


In [0]:
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

# Use the solution ARN to get the solution status.
solution_description = personalize.describe_solution(solutionArn = solution_arn)['solution']
print('Solution status: ' + solution_description['status'])

Solution status: ACTIVE


In [0]:
# Use the solution ARN to create a solution version.
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

print ('Creating solution version')
response = personalize.create_solution_version(solutionArn = solution_arn)
solution_version_arn = response['solutionVersionArn']
print('Solution version ARN: ' + solution_version_arn)

Creating solution version
Solution version ARN: arn:aws:personalize:us-east-1:522036915387:solution/BookSolution/3562fcfe


In [0]:
solution_version_arn = 'arn:aws:personalize:us-east-1:522036915387:solution/BookSolution/3562fcfe'

In [0]:
# Use the solution version ARN to get the solution version status.
# It takes a few hours to train the model
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

solution_version_description = personalize.describe_solution_version(
    solutionVersionArn = solution_version_arn)['solutionVersion']
print('Solution version status: ' + solution_version_description['status'])

Solution version status: ACTIVE


Metrics of Model

In [0]:
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.get_solution_metrics(
    solutionVersionArn = solution_version_arn)

print(response['metrics'])

{'coverage': 0.0105, 'mean_reciprocal_rank_at_25': 0.0169, 'normalized_discounted_cumulative_gain_at_10': 0.0213, 'normalized_discounted_cumulative_gain_at_25': 0.0256, 'normalized_discounted_cumulative_gain_at_5': 0.0183, 'precision_at_10': 0.0034, 'precision_at_25': 0.0021, 'precision_at_5': 0.0048}


### Create a Campaign (Deploying a Solution Version)

In [0]:
import boto3

personalize = boto3.client('personalize', region_name='us-east-1')

response = personalize.create_campaign(
    name = 'BookCampaign',
    solutionVersionArn = solution_version_arn,
    minProvisionedTPS = 10)

Name: BookCampaign
ARN: arn:aws:personalize:us-east-1:522036915387:campaign/BookCampaign
Status: CREATE PENDING


In [0]:
arn = response['campaignArn']

description = personalize.describe_campaign(campaignArn = arn)['campaign']
print('Name: ' + description['name'])
print('ARN: ' + description['campaignArn'])
print('Status: ' + description['status'])

Name: BookCampaign
ARN: arn:aws:personalize:us-east-1:522036915387:campaign/BookCampaign
Status: ACTIVE


### Get book recommendations for a user

In [0]:
import boto3

personalizeRt = boto3.client('personalize-runtime', region_name='us-east-1')

response = personalizeRt.get_recommendations(
    campaignArn = 'arn:aws:personalize:us-east-1:522036915387:campaign/BookCampaign',
    userId = '5')

print("Recommended Books")
print("ISBN")
for book in response['itemList']:
    print (book['itemId'])

Recommended Books
ISBN
0971880107
0312195516
0060928336
0060930535
067976402X
0060502258
0060959037
0804106304
0684872153
0316666343
0679781587
0452282152
0060011912
006101351X
0786868716
0743237188
0805063897
0671510053
0060934417
0316601950
0804114986
1551660717
0060392452
0312278586
0062502182
