# Putting Files in the Cloud

Typical projects will need the following to be successful:
- storage
- computer services
- notifications

## Boto3

`boto3` is used to intereact with AWS in python.

In [36]:
# !pip install boto3
# http://2017.compciv.org/guide/topics/aws/intro-to-aws-boto3.html#id4 - worth checking

In [2]:
import boto3

s3 = boto3.client(
    's3',
#     region_name="us-east-1",
#     aws_access_key_id=os.environ.get(AWS_KEY_ID),
#     aws_secret_access_key=os.environ.get(AWS_SECRET),
)

response = s3.list_buckets()

## Initial set-up

1. Create a key and a secret for `boto3` through IAM services
2. `pip install awscli`
    - http://2017.compciv.org/guide/topics/aws/intro-to-aws-boto3.html#id4

## Some AWS services

- IAM
- S3
- SNS
- Comprehend
- Rekognition

## Creating multiple clients

We can create multiple `boto3` clients from the script

In [3]:
# Generate the boto3 client for interacting with S3 and SNS
s3 = boto3.client('s3', region_name="us-east-1")

sns = boto3.client('sns', region_name="us-east-1")

# List S3 buckets and SNS topics
buckets = s3.list_buckets()
topics = sns.list_topics()

# Print out the list of SNS topics
print(topics)

{'Topics': [], 'ResponseMetadata': {'RequestId': '505dbd5a-6885-556b-bcc4-ebe5e0c873a1', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '505dbd5a-6885-556b-bcc4-ebe5e0c873a1', 'content-type': 'text/xml', 'content-length': '256', 'date': 'Thu, 13 Feb 2020 17:23:25 GMT'}, 'RetryAttempts': 0}}


## Diving into buckets

With S3 we can put any file in the cloud and make accessible anywhere via a URL.

S3 consists of:
- buckets
    - similar to folders in a desktop
    - own permission policies
    - can be configured to act as folders to a website (website storage)
    - generate logs
    - most importantly: buckets contain objects (objects can be anything!)
- objects
    - similar to files in folders\
    
### Using buckets in `boto3`

We can:
- create a bucket
- list buckets
- delete a bucket

#### Creating a bucket

In [39]:
import boto3

s3 = boto3.client("s3") # rest is configured in awscli

In [41]:
bucket = s3.create_bucket(Bucket="miguel-gid-requests")

#### Creating a bucket

In [45]:
s3.list_buckets()["Buckets"]

[{'Name': 'dned-miguelccarvalho-bucket-demo', 'CreationDate': datetime.datetime(2019, 8, 27, 8, 54, 35, tzinfo=tzutc())}, {'Name': 'miguel-gid-requests', 'CreationDate': datetime.datetime(2020, 2, 13, 13, 56, 26, tzinfo=tzutc())}]

#### Deleting a bucket

In [48]:
s3.delete_bucket(Bucket="miguel-gid-requests")

{'ResponseMetadata': {'RequestId': '9DE8933C0C7AD6A0', 'HostId': 'l8cHKfOd7CJnzyijh8cKOYwU+mQwCLUWukOft4vp8udbfhjbuGlW4mS0erH9/7Ye6aonCCSWmPI=', 'HTTPStatusCode': 204, 'HTTPHeaders': {'x-amz-id-2': 'l8cHKfOd7CJnzyijh8cKOYwU+mQwCLUWukOft4vp8udbfhjbuGlW4mS0erH9/7Ye6aonCCSWmPI=', 'x-amz-request-id': '9DE8933C0C7AD6A0', 'date': 'Thu, 13 Feb 2020 13:59:35 GMT', 'server': 'AmazonS3'}, 'RetryAttempts': 0}}

In [49]:
s3.list_buckets()["Buckets"]

[{'Name': 'dned-miguelccarvalho-bucket-demo', 'CreationDate': datetime.datetime(2019, 8, 27, 8, 54, 35, tzinfo=tzutc())}]

## Uploading and retrieving files

Files in S3 buckets are called *objects*. These objects can be anything: `.csv`, `.pdf`, `.mp4`, etc.

### Bucket vs Objects

Bucket:
- has a name
- name is a string
- unique name in all of S3
- contains many objects

Object:
- object has a key
- name in full path from bucket root
- unique key in the bucket
- can be only in one parent bucket

In [51]:
# check available buckets before uploading
s3.list_buckets()["Buckets"]

[{'Name': 'dned-miguelccarvalho-bucket-demo', 'CreationDate': datetime.datetime(2019, 8, 27, 8, 54, 35, tzinfo=tzutc())}]

In [52]:
# notice it does not return anything
# in case of an error, an exception is thrown
s3.upload_file(
    Filename="../../5_introduction_to_shell/datasets/cities.csv", # local filename
    Bucket="dned-miguelccarvalho-bucket-demo", # bucket to be parent of file
    Key="cities.csv" # s3 name for thee file
)

In [53]:
# listing objects
s3.list_objects(
    Bucket="dned-miguelccarvalho-bucket-demo",
    MaxKeys=2, # limiting max results to 2, if ommitted, default is 1000
    Prefix="cit" # only objects which prefix match this string
)

{'ResponseMetadata': {'RequestId': 'BB7B701CE90A552A', 'HostId': 'tDXhv5NtBXwUmI/erMV/eclAWF8bq5UTzLJL+AF3xOeNyELNc6O1JgzRQEVI6GddnckrpOQh8mM=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': 'tDXhv5NtBXwUmI/erMV/eclAWF8bq5UTzLJL+AF3xOeNyELNc6O1JgzRQEVI6GddnckrpOQh8mM=', 'x-amz-request-id': 'BB7B701CE90A552A', 'date': 'Thu, 13 Feb 2020 14:11:45 GMT', 'x-amz-bucket-region': 'eu-west-2', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'IsTruncated': False, 'Marker': '', 'Contents': [{'Key': 'cities.csv', 'LastModified': datetime.datetime(2020, 2, 13, 14, 10, 1, tzinfo=tzutc()), 'ETag': '"2dd39cb6de5ecc471fa37f1f2aac759f"', 'Size': 8402, 'StorageClass': 'STANDARD', 'Owner': {'ID': 'e6119f625207062af5bb798a6418c2872efd1a107133fdb9f3c3580b850f313b'}}], 'Name': 'dned-miguelccarvalho-bucket-demo', 'Prefix': 'cit', 'MaxKeys': 2, 'EncodingType': 'url'}

In [54]:
# listing objects
s3.list_objects(
    Bucket="dned-miguelccarvalho-bucket-demo",
    MaxKeys=2, # limiting max results to 2, if ommitted, default is 1000
    Prefix="cit" # only objects which prefix match this string
)["Contents"]

[{'Key': 'cities.csv', 'LastModified': datetime.datetime(2020, 2, 13, 14, 10, 1, tzinfo=tzutc()), 'ETag': '"2dd39cb6de5ecc471fa37f1f2aac759f"', 'Size': 8402, 'StorageClass': 'STANDARD', 'Owner': {'ID': 'e6119f625207062af5bb798a6418c2872efd1a107133fdb9f3c3580b850f313b'}}]

In [55]:
# downloading a file
s3.download_file(
    Filename="cities.csv", # filename and local path to be downloaded to
    Bucket="dned-miguelccarvalho-bucket-demo",
    Key="cities.csv"
)

In [56]:
!ls

8_introduction_to_aws_boto_in_python.ipynb
cities.csv


In [57]:
# deleting a file
s3.delete_object(
    Bucket="dned-miguelccarvalho-bucket-demo",
    Key="cities.csv"
)

{'ResponseMetadata': {'RequestId': '2160F59BC8457FF9', 'HostId': 'IVIgOeJ27RxdxbGkG3dWM/Xq81Cv0jbqIT6xlCU4LlUwFNJ42ImpYb8v3U6XhTzpsn03xZEgU7E=', 'HTTPStatusCode': 204, 'HTTPHeaders': {'x-amz-id-2': 'IVIgOeJ27RxdxbGkG3dWM/Xq81Cv0jbqIT6xlCU4LlUwFNJ42ImpYb8v3U6XhTzpsn03xZEgU7E=', 'x-amz-request-id': '2160F59BC8457FF9', 'date': 'Thu, 13 Feb 2020 14:15:06 GMT', 'server': 'AmazonS3'}, 'RetryAttempts': 0}}

In [60]:
# listing objects
# see there is no Contents key
response = s3.list_objects(
    Bucket="dned-miguelccarvalho-bucket-demo",
    MaxKeys=2, # limiting max results to 2, if ommitted, default is 1000
    Prefix="cit" # only objects which prefix match this string
)

In [61]:
if 'Contents' in response:
    for obj in response["Contents"]:
        print(obj["Name"])

> Add `if 'Contents' in response:` in case no responses are returned otherwise we'll get a `KeyError`

# Sharing Files Securely

The default for files is to deny permission - files are only accessible with our key.


## AWS Permissions Systems

There are 4 ways to control permission is S3:

1. IAM (what can the user do in AWS?)
    - Use this to control users' access to services, buckets, objects
    - We attach IAM policies to a user
    - Applies across all AWS Services
2. Bucket Policy (who can access this S3 bucket?)
    - Gives us control of the bucket and the objects within it
3. ACL (who can access object?)
    - Let us set permissions on specific objects within a bucket
4. Presigned URL
    - Let us provide temporary access to an object
    
IAM and Bucket Policies are great in multi-user environments.    

## ACL

ACLs are entities attached  to objects in S3. We will focus on two types of ACL:
1. Private
2. Public read

By default ACL is private. We can change it to public.

In [63]:
# let's upload this file again
s3.upload_file(
    Filename="../../5_introduction_to_shell/datasets/cities.csv", # local filename
    Bucket="dned-miguelccarvalho-bucket-demo", # bucket to be parent of file
    Key="cities.csv" # s3 name for thee file
)

In [64]:
s3.put_object_acl(
    Bucket="dned-miguelccarvalho-bucket-demo",
    Key="cities.csv",
    ACL="public-read"
)

{'ResponseMetadata': {'RequestId': '1F880108F0754C75', 'HostId': '7iJ1hHWcnN2zs8bPfTEIbbl1SYBfh3rOMFYcihkHO/eHkDfkydz82wcpCtRalcS3kIgtwwHiDIY=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '7iJ1hHWcnN2zs8bPfTEIbbl1SYBfh3rOMFYcihkHO/eHkDfkydz82wcpCtRalcS3kIgtwwHiDIY=', 'x-amz-request-id': '1F880108F0754C75', 'date': 'Thu, 13 Feb 2020 14:28:40 GMT', 'content-length': '0', 'server': 'AmazonS3'}, 'RetryAttempts': 0}}

In [65]:
# we can also do this on upload
# let's upload this file again
s3.upload_file(
    Filename="../../5_introduction_to_shell/datasets/cities.csv", # local filename
    Bucket="dned-miguelccarvalho-bucket-demo", # bucket to be parent of file
    Key="cities_2.csv", # s3 name for thee file
    ExtraArgs={"ACL":"public-read"} 
)

### Accessing public objects

If an object is public, anyone can access it with:

https://{bucket}.s3.{region}.amazonaws.com/{key}

https://dned-miguelccarvalho-bucket-demo.s3.eu-west-2.amazonaws.com/cities_2.csv

In [67]:
url = "https://dned-miguelccarvalho-bucket-demo.s3.eu-west-2.amazonaws.com/cities_2.csv"
pd.read_csv(url)

Unnamed: 0,LatD,"""LatM""","""LatS""","""NS""","""LonD""","""LonM""","""LonS""","""EW""","""City""","""State"""
0,41,5,59,"""N""",80,39,0,"""W""","""Youngstown""",OH
1,42,52,48,"""N""",97,23,23,"""W""","""Yankton""",SD
2,46,35,59,"""N""",120,30,36,"""W""","""Yakima""",WA
3,42,16,12,"""N""",71,48,0,"""W""","""Worcester""",MA
4,43,37,48,"""N""",89,46,11,"""W""","""Wisconsin Dells""",WI
5,36,5,59,"""N""",80,15,0,"""W""","""Winston-Salem""",NC
6,49,52,48,"""N""",97,9,0,"""W""","""Winnipeg""",MB
7,39,11,23,"""N""",78,9,36,"""W""","""Winchester""",VA
8,34,14,24,"""N""",77,55,11,"""W""","""Wilmington""",NC
9,39,45,0,"""N""",75,33,0,"""W""","""Wilmington""",DE


### How access is decided

1. Someone requests a download
2. Check for pre-signed URL: if ok, allow; if not, check policies
3. Check IAM, Bucket Policy, and ACL: if ok, allow; if not, deny

## Accessing private objects in S3

Since by default access is forbidden, we won't be able to read files with pandas.

In [70]:
private_url = "https://dned-miguelccarvalho-bucket-demo.s3.eu-west-2.amazonaws.com/2018-11-28-events.json"

# check how we get a 403 error
pd.read_csv(private_url)

HTTPError: HTTP Error 403: Forbidden

### Solutions

If the file is not expected to change much: 

1. Download the file with `s3.download_file`
2. Use `pd.read_csv` with file locally saved

Better option:

In [71]:
obj = s3.get_object(
    Bucket="dned-miguelccarvalho-bucket-demo",
    Key="2018-11-28-events.json"
)

print(obj)

{'ResponseMetadata': {'RequestId': '219D6B315D2DD587', 'HostId': '2PEZjCO/h7myVEXbzn6QSCrYSDJxQJ01lQNagEqoQoKQBGi0C5JixB0RHwMGsrGqZ7MSoviECfE=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '2PEZjCO/h7myVEXbzn6QSCrYSDJxQJ01lQNagEqoQoKQBGi0C5JixB0RHwMGsrGqZ7MSoviECfE=', 'x-amz-request-id': '219D6B315D2DD587', 'date': 'Thu, 13 Feb 2020 14:57:00 GMT', 'last-modified': 'Tue, 27 Aug 2019 08:57:01 GMT', 'etag': '"958bb973e883807f00a35801b9191c7f"', 'accept-ranges': 'bytes', 'content-type': 'application/json', 'content-length': '202910', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'AcceptRanges': 'bytes', 'LastModified': datetime.datetime(2019, 8, 27, 8, 57, 1, tzinfo=tzutc()), 'ContentLength': 202910, 'ETag': '"958bb973e883807f00a35801b9191c7f"', 'ContentType': 'application/json', 'Metadata': {}, 'Body': <botocore.response.StreamingBody object at 0x11c432090>}


In [72]:
# we get a streaming body as part of the response's body key
# pandas can read this!
pd.read_csv(obj["Body"])

Unnamed: 0,"{""artist"":""Mitch Ryder & The Detroit Wheels""","auth:""Logged In""","firstName:""Tegan""","gender:""F""",itemInSession:65,"lastName:""Levine""",length:205.03465,"level:""paid""","location:""Portland-South Portland","ME""","method:""PUT""","page:""NextSong""",registration:1540794356796.0,sessionId:992,"song:""Jenny Take A Ride (LP Version)""",status:200,ts:1543363215796,"userAgent:""\""Mozilla\/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit\/537.36 (KHTML","like Gecko) Chrome\/36.0.1985.143 Safari\/537.36\""""","userId:""80""}"
0,"{""artist"":""The Spill Canvas""","auth:""Logged In""","firstName:""Tegan""","gender:""F""",itemInSession:66,"lastName:""Levine""",length:358.03383,"level:""paid""","location:""Portland-South Portland","ME""","method:""PUT""","page:""NextSong""",registration:1540794356796.0,sessionId:992,"song:""The TIde (LP Version)""",status:200,ts:1543363420796,"userAgent:""\""Mozilla\/5.0 (Macintosh; Intel Ma...",like Gecko) Chrome\/36.0.1985.143 Safari\/537...,"userId:""80""}"
1,"{""artist"":""Mogwai""","auth:""Logged In""","firstName:""Tegan""","gender:""F""",itemInSession:67,"lastName:""Levine""",length:571.19302,"level:""paid""","location:""Portland-South Portland","ME""","method:""PUT""","page:""NextSong""",registration:1540794356796.0,sessionId:992,"song:""Two Rights Make One Wrong""",status:200,ts:1543363778796,"userAgent:""\""Mozilla\/5.0 (Macintosh; Intel Ma...",like Gecko) Chrome\/36.0.1985.143 Safari\/537...,"userId:""80""}"
2,"{""artist"":""Spor""","auth:""Logged In""","firstName:""Tegan""","gender:""F""",itemInSession:68,"lastName:""Levine""",length:380.3424,"level:""paid""","location:""Portland-South Portland","ME""","method:""PUT""","page:""NextSong""",registration:1540794356796.0,sessionId:992,"song:""Way Of The Samurai""",status:200,ts:1543364349796,"userAgent:""\""Mozilla\/5.0 (Macintosh; Intel Ma...",like Gecko) Chrome\/36.0.1985.143 Safari\/537...,"userId:""80""}"
3,"{""artist"":""DJ Dizzy""","auth:""Logged In""","firstName:""Tegan""","gender:""F""",itemInSession:69,"lastName:""Levine""",length:221.1522,"level:""paid""","location:""Portland-South Portland","ME""","method:""PUT""","page:""NextSong""",registration:1540794356796.0,sessionId:992,"song:""Sexy Bitch""",status:200,ts:1543364729796,"userAgent:""\""Mozilla\/5.0 (Macintosh; Intel Ma...",like Gecko) Chrome\/36.0.1985.143 Safari\/537...,"userId:""80""}"
4,"{""artist"":""Erik Hassle""","auth:""Logged In""","firstName:""Tegan""","gender:""F""",itemInSession:70,"lastName:""Levine""",length:183.43138,"level:""paid""","location:""Portland-South Portland","ME""","method:""PUT""","page:""NextSong""",registration:1540794356796.0,sessionId:992,"song:""Hurtful""",status:200,ts:1543364950796,"userAgent:""\""Mozilla\/5.0 (Macintosh; Intel Ma...",like Gecko) Chrome\/36.0.1985.143 Safari\/537...,"userId:""80""}"
5,"{""artist"":null","auth:""Logged Out""",firstName:null,gender:null,itemInSession:0,lastName:null,length:null,"level:""paid""",location:null,"method:""GET""","page:""Home""",registration:null,sessionId:952,song:null,status:200,ts:1543365211796,userAgent:null,"userId:""""}",,
6,"{""artist"":null","auth:""Logged Out""",firstName:null,gender:null,itemInSession:1,lastName:null,length:null,"level:""paid""",location:null,"method:""PUT""","page:""Login""",registration:null,sessionId:952,song:null,status:307,ts:1543365212796,userAgent:null,"userId:""""}",,
7,"{""artist"":null","auth:""Logged In""","firstName:""Aleena""","gender:""F""",itemInSession:2,"lastName:""Kirby""",length:null,"level:""paid""","location:""Waterloo-Cedar Falls","IA""","method:""GET""","page:""Home""",registration:1541022995796.0,sessionId:952,song:null,status:200,ts:1543365223796,"userAgent:""Mozilla\/5.0 (Macintosh; Intel Mac ...","userId:""44""}",
8,"{""artist"":null","auth:""Logged In""","firstName:""Tegan""","gender:""F""",itemInSession:71,"lastName:""Levine""",length:null,"level:""paid""","location:""Portland-South Portland","ME""","method:""GET""","page:""Home""",registration:1540794356796.0,sessionId:992,song:null,status:200,ts:1543365724796,"userAgent:""\""Mozilla\/5.0 (Macintosh; Intel Ma...",like Gecko) Chrome\/36.0.1985.143 Safari\/537...,"userId:""80""}"
9,"{""artist"":null","auth:""Logged In""","firstName:""Sylvie""","gender:""F""",itemInSession:0,"lastName:""Cruz""",length:null,"level:""free""","location:""Washington-Arlington-Alexandria","DC-VA-MD-WV""","method:""GET""","page:""Home""",registration:1540266185796.0,sessionId:932,song:null,status:200,ts:1543368722796,"userAgent:""\""Mozilla\/5.0 (Macintosh; Intel Ma...","like Gecko) Version\/7.0.5 Safari\/537.77.4\""""","userId:""10""}"


## Pre-signed URLs

This is another alternative to read private files.

- Expire after a certain timeframe
- Great for temporary access

In [83]:
s3.upload_file(
    Filename="cities.csv",
    Key="more_cities.csv",
    Bucket="dned-miguelccarvalho-bucket-demo",
)

In [84]:
share_url = s3.generate_presigned_url(
    ClientMethod="get_object",
    ExpiresIn=3600,
    Params={
        "Bucket": "dned-miguelccarvalho-bucket-demo", 
        "Key": "more_cities.csv",
    }
)

In [85]:
share_url

'https://dned-miguelccarvalho-bucket-demo.s3.amazonaws.com/more_cities.csv?AWSAccessKeyId=AKIARGNCQPRZTWLBQDP3&Signature=sjQTrHm%2FrbhuF3TkHH8jvls1NOg%3D&Expires=1581610349'

In [86]:
# not working...
pd.read_csv(share_url)

HTTPError: HTTP Error 400: Bad Request

### Load multiple files into on DataFrame



In [87]:
# create a list to  hold our dfs
df_list = []

# request the list of csv's from S3 with prefix
response = s3.list_objects(
    Bucket="dned-miguelccarvalho-bucket-demo",
    Prefix="cit",
)

# get response contents
request_files = response["Contents"]

In [98]:
import ipdb
for file in request_files:
#     ipdb.set_trace()
    try:
        obj = s3.get_object(Bucket="gid-requests", Key=file["Key"])
    except s3.exceptions.ClientError:
        continue
    obj_df = pd.read_csv(obj["Body"])
    
    df_list.append(obj_df)

In [101]:
# if we had anything
pd.concat(df_list, ignore_index=True)

ValueError: No objects to concatenate

## Sharing Files through a Website

S3 is able to serve as HTML pages which is useful for sharing the results of analysis with stakeholders.

### `to_html`

pandas has a `to_html` method which we can use to write a pandas dataframe to html file.

Useful optional parameters are:
- `render_links`
    - make URLs clickable
- `columns`
    - only show these cols
- `border`
    - make border thicker of thinner

### Uploading an HTML file 

In [103]:
pd.read_csv("cities.csv").to_html("cities.html")

In [105]:
s3.upload_file(
    Filename="cities.html",
    Bucket="dned-miguelccarvalho-bucket-demo",
    Key="cities.html",
    ExtraArgs={
        "ContentType": "text/html", # this tells browser how to render file
        "ACL": "public-read",
    }
)

https://dned-miguelccarvalho-bucket-demo.s3.eu-west-2.amazonaws.com/cities.html

### Showing an index table for the files we want to share

In [106]:
r = s3.list_objects(
    Bucket="dned-miguelccarvalho-bucket-demo",
    Prefix="cit",
)

In [107]:
objects_df = pd.DataFrame(r["Contents"])

In [108]:
objects_df

Unnamed: 0,ETag,Key,LastModified,Owner,Size,StorageClass
0,"""2dd39cb6de5ecc471fa37f1f2aac759f""",cities.csv,2020-02-13 14:28:38+00:00,{'ID': 'e6119f625207062af5bb798a6418c2872efd1a...,8402,STANDARD
1,"""0b8ccf426c22318acd950ab95b084f73""",cities.html,2020-02-13 15:31:55+00:00,{'ID': 'e6119f625207062af5bb798a6418c2872efd1a...,29530,STANDARD
2,"""2dd39cb6de5ecc471fa37f1f2aac759f""",cities_2.csv,2020-02-13 14:29:26+00:00,{'ID': 'e6119f625207062af5bb798a6418c2872efd1a...,8402,STANDARD


In [109]:
# add a link to each file
base_url = "https://dned-miguelccarvalho-bucket-demo.s3.eu-west-2.amazonaws.com/"
objects_df["Link"] = base_url + objects_df["Key"]

In [110]:
objects_df.to_html(
    "html_listing.html",
    columns=["Link", "LastModified", "Size"],
    render_links=True,
)

In [111]:
s3.upload_file(
    Filename="html_listing.html",
    Bucket="dned-miguelccarvalho-bucket-demo",
    Key="html_listing.html",
    ExtraArgs={
        "ContentType": "text/html", # this tells browser how to render file
        "ACL": "public-read",
    }
)

https://dned-miguelccarvalho-bucket-demo.s3.eu-west-2.amazonaws.com/html_listing.html

# SNS Topics

SNS is used to send emails, notifications, sms to clients. 

The framework is that there are:
- Publishers
    - those sending content
- Subscribers
    - those consuming content
    
Publishers send content to *topics* which subscribers subscribe to.

### About topics

Every topic has an **ARN** (Amazon Resource Name) which is the unique ID to the topic. The subscriptions to the topics have a unique ID as well.

In [3]:
# creeating a topic with boto
sns = boto3.client(
    "sns",
    region_name="eu-west-1"
    # we can omit the other params since awscli did that for us
)

In [4]:
# this operation is idempotent; grab the ARN
response = sns.create_topic(Name="city_alerts")
response["TopicArn"]

# better in a one-liner
response = sns.create_topic(Name="city_alerts")["TopicArn"]

In [5]:
response

'arn:aws:sns:eu-west-1:082482068595:city_alerts'

### List topics

In [117]:
sns.list_topics()

{'Topics': [{'TopicArn': 'arn:aws:sns:eu-west-1:082482068595:city_alerts'}], 'ResponseMetadata': {'RequestId': 'ea692f5e-d167-525d-978f-79e1ab9368c0', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': 'ea692f5e-d167-525d-978f-79e1ab9368c0', 'content-type': 'text/xml', 'content-length': '376', 'date': 'Thu, 13 Feb 2020 16:26:28 GMT'}, 'RetryAttempts': 0}}

In [16]:
topic_arn = 'arn:aws:sns:eu-west-1:082482068595:city_alerts'

### Deleting a topic

In [118]:
sns.delete_topic(TopicArn=response["TopicArn"])

{'ResponseMetadata': {'RequestId': '33584505-4a2d-52bc-b215-13ba3102c8f9', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amzn-requestid': '33584505-4a2d-52bc-b215-13ba3102c8f9', 'content-type': 'text/xml', 'content-length': '201', 'date': 'Thu, 13 Feb 2020 16:27:26 GMT'}, 'RetryAttempts': 0}}

## SNS Subscriptions

Every subscription to a topic has:
- unique ID
- protocol 
    - there are several protocols, we'll only see EMAIL and SMS
- status
    - confirmed or  pending  confirmation; phone numbers are  automatically confirmed but emails  must be confirmed through a confirmation email in the inbox
- endpoint 
    - the specific phone number or email the message should be sent to

### Creating an SMS subscription

In [6]:
sns.subscribe(
    TopicArn=response,
    Protocol="SMS",
    Endpoint="+447492937145"
)

{'SubscriptionArn': 'arn:aws:sns:eu-west-1:082482068595:city_alerts:a8edde6d-8d47-4c41-bb24-56bae6960cd6',
 'ResponseMetadata': {'RequestId': '66a5e87d-01ad-5185-beac-bec27d557e09',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '66a5e87d-01ad-5185-beac-bec27d557e09',
   'content-type': 'text/xml',
   'content-length': '361',
   'date': 'Thu, 13 Feb 2020 16:35:04 GMT'},
  'RetryAttempts': 0}}

### Creating an email subscription

In [8]:
sns.subscribe(
    TopicArn=response,
    Protocol="EMAIL",
    Endpoint="miguelcansadocarvalho@gmail.com"
)

{'SubscriptionArn': 'pending confirmation',
 'ResponseMetadata': {'RequestId': 'd278d728-bc5f-5824-b801-8197c1174daf',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'd278d728-bc5f-5824-b801-8197c1174daf',
   'content-type': 'text/xml',
   'content-length': '298',
   'date': 'Thu, 13 Feb 2020 16:36:24 GMT'},
  'RetryAttempts': 0}}

### Listing subscriptions

In [10]:
sns.list_subscriptions_by_topic(
    TopicArn=response,
)

{'Subscriptions': [{'SubscriptionArn': 'arn:aws:sns:eu-west-1:082482068595:city_alerts:b1b1af98-68b4-4cd1-97f9-9015c41081ad',
   'Owner': '082482068595',
   'Protocol': 'email',
   'Endpoint': 'miguelcansadocarvalho@gmail.com',
   'TopicArn': 'arn:aws:sns:eu-west-1:082482068595:city_alerts'},
  {'SubscriptionArn': 'arn:aws:sns:eu-west-1:082482068595:city_alerts:a8edde6d-8d47-4c41-bb24-56bae6960cd6',
   'Owner': '082482068595',
   'Protocol': 'sms',
   'Endpoint': '+447492937145',
   'TopicArn': 'arn:aws:sns:eu-west-1:082482068595:city_alerts'}],
 'ResponseMetadata': {'RequestId': '1f895936-80e4-5a5a-ae60-0f89fd9ba81d',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '1f895936-80e4-5a5a-ae60-0f89fd9ba81d',
   'content-type': 'text/xml',
   'content-length': '1051',
   'date': 'Thu, 13 Feb 2020 16:37:19 GMT'},
  'RetryAttempts': 0}}

### Listing all subscriptions

In [14]:
sns.list_subscriptions()["Subscriptions"]

[]

### Delete subscription

In [15]:
sns.unsubscribe(SubscriptionArn='arn:aws:sns:eu-west-1:082482068595:city_alerts:b1b1af98-68b4-4cd1-97f9-9015c41081ad')

{'ResponseMetadata': {'RequestId': '3d3c51bf-5888-5da8-9a90-0700268e496f',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '3d3c51bf-5888-5da8-9a90-0700268e496f',
   'content-type': 'text/xml',
   'content-length': '201',
   'date': 'Thu, 13 Feb 2020 16:39:34 GMT'},
  'RetryAttempts': 0}}

## Sending Messages

### Publishing to a Topic

When we publish to a topic, all our subscribers will receive it.

In [17]:
response = sns.publish(
    TopicArn=topic_arn,
    Message="Body of text or email",
    Subject="Subject Line for Email",
)

### Sending a one-off SMS

In [18]:
response = sns.publish(
    PhoneNumber="+447492937145",
    Message="Fran é um purdle",
)

# Pattern Rekognition

This is a computer vision API by AWS which:
- detects objects in images
- extracts text from images

## Learning new AWS services

1. Find them on the AWS interface
2. Look them up on [boto docs](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
3. Find the parameter to pass to create client as well as methods

## Uploading a test image to S3



In [12]:
s3 = boto3.client("s3", region_name="us-east-1")

In [14]:
s3.create_bucket(Bucket="images-datacamp-test-miguel")

{'ResponseMetadata': {'RequestId': '998925FA24F874A2',
  'HostId': 'Jd7Cbfugb0BWgw1lOGmA4lvttj28LHifmfM4PBmyjXVSsRWACYPgvVag0vnga09yqfcB1bEBV3Q=',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amz-id-2': 'Jd7Cbfugb0BWgw1lOGmA4lvttj28LHifmfM4PBmyjXVSsRWACYPgvVag0vnga09yqfcB1bEBV3Q=',
   'x-amz-request-id': '998925FA24F874A2',
   'date': 'Thu, 13 Feb 2020 17:26:42 GMT',
   'location': '/images-datacamp-test-miguel',
   'content-length': '0',
   'server': 'AmazonS3'},
  'RetryAttempts': 0},
 'Location': '/images-datacamp-test-miguel'}

In [15]:
s3.upload_file(
    Filename="kitchen.png",
    Bucket="images-datacamp-test-miguel",
    Key="kitchen.png",
    ExtraArgs={
        "ACL": "public-read",
    }
)

## Analyzing the image

In [20]:
import boto3

# make sure boto3 rekognition client is in the same region
# as the image! 
rekog = boto3.client(
    "rekognition",
    region_name="us-east-1",
)

In [21]:
response = rekog.detect_labels(
    Image={
        'S3Object': {
            'Bucket': 'images-datacamp-test-miguel',
            'Name': 'kitchen.png',
        }
    },
    MaxLabels=10,
)

In [22]:
response

{'Labels': [{'Name': 'Laundry',
   'Confidence': 95.0337905883789,
   'Instances': [],
   'Parents': []},
  {'Name': 'Housing',
   'Confidence': 91.11283874511719,
   'Instances': [],
   'Parents': [{'Name': 'Building'}]},
  {'Name': 'Building',
   'Confidence': 91.11283874511719,
   'Instances': [],
   'Parents': []},
  {'Name': 'Washer',
   'Confidence': 85.34657287597656,
   'Instances': [],
   'Parents': [{'Name': 'Appliance'}]},
  {'Name': 'Appliance',
   'Confidence': 85.34657287597656,
   'Instances': [],
   'Parents': []},
  {'Name': 'Window',
   'Confidence': 73.19082641601562,
   'Instances': [],
   'Parents': []},
  {'Name': 'Interior Design',
   'Confidence': 70.55426025390625,
   'Instances': [],
   'Parents': [{'Name': 'Indoors'}]},
  {'Name': 'Indoors',
   'Confidence': 70.55426025390625,
   'Instances': [],
   'Parents': []},
  {'Name': 'Architecture',
   'Confidence': 62.6004753112793,
   'Instances': [],
   'Parents': [{'Name': 'Building'}]},
  {'Name': 'Sink Faucet',

## Comprehending Text

AWS provides two main services for making sense of text:
- Comprehend
- Translate

### Translate

In [23]:
translate = boto3.client("translate", region_name="us-east-1")

In [24]:
response = translate.translate_text(
    Text="Olá, tudo bem?",
    SourceLanguageCode="auto",
    TargetLanguageCode="en",
)

In [25]:
response

{'TranslatedText': 'Hey, you all right?',
 'SourceLanguageCode': 'pt',
 'TargetLanguageCode': 'en',
 'ResponseMetadata': {'RequestId': 'e2280388-c2da-4d52-ac83-8e9742437bc2',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'e2280388-c2da-4d52-ac83-8e9742437bc2',
   'cache-control': 'no-cache',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '92',
   'date': 'Thu, 13 Feb 2020 17:43:41 GMT'},
  'RetryAttempts': 0}}

In [26]:
# or to go straight to the response
response["TranslatedText"]

'Hey, you all right?'

### Detecting Language

In [27]:
comprehend = boto3.client("comprehend", region_name="us-east-1")

In [28]:
response = comprehend.detect_dominant_language(Text="Merhaba, nasilsin?")

In [29]:
response

{'Languages': [{'LanguageCode': 'tr', 'Score': 0.9938409328460693}],
 'ResponseMetadata': {'RequestId': 'cc6b7dd3-b23e-4339-936e-56c4211265bd',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': 'cc6b7dd3-b23e-4339-936e-56c4211265bd',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '64',
   'date': 'Thu, 13 Feb 2020 17:45:24 GMT'},
  'RetryAttempts': 0}}

### Understanding Sentiment

In [31]:
comprehend.detect_sentiment(
    Text="Não tenho paciência para isto",
    LanguageCode="pt",
)

{'Sentiment': 'NEGATIVE',
 'SentimentScore': {'Positive': 0.009575205855071545,
  'Negative': 0.9445379972457886,
  'Neutral': 0.045885879546403885,
  'Mixed': 9.258811246581899e-07},
 'ResponseMetadata': {'RequestId': '8a12b97a-6af3-4d91-8c30-580e0461f590',
  'HTTPStatusCode': 200,
  'HTTPHeaders': {'x-amzn-requestid': '8a12b97a-6af3-4d91-8c30-580e0461f590',
   'content-type': 'application/x-amz-json-1.1',
   'content-length': '165',
   'date': 'Thu, 13 Feb 2020 17:46:30 GMT'},
  'RetryAttempts': 0}}