### A. Boto3 library

In [None]:
import boto3

s3 = boto3.client('s3',
                   region_name='us-east-1',
                   aws_access_key_id=AWS_KEY_ID,
                   aws_secret_access_key=AWS_SECRET)

response = s3.list_buckets()

### B. AWS services

- IAM is for user management and giving access


- S3 is for storing objects like images in the cloud


- SNS is for sending alerts via SMS and email


- Rekognition is for computer vision and detecting labels in an image


- Comprehend is for sentiment analysis on blocks of the text

### C. AWS S3

#### S3 components

(1) Buckets (like folders)

- A bucket has a name


- Name is a string


- Unique name in all of S3


- Contains many objects


(2) Objects (like files)

- An object has a key


- Name is full path from bucket root


- Unique key in the bucket


- Can only be in one parent bucket

#### Bucket

In [None]:
# Create Bucket
bucket = s3.create_bucket(Bucket='gid-requests')

# List Buckets
bucket_response = s3.list_buckets()

# Get Buckets Dictionary
buckets = bucket_response['Buckets']

# Iterate over Buckets from .list_buckets() response
for bucket in bucket_response['Buckets']:
    # Print the Name for each bucket
    print(bucket['Name'])
    
# Delete Bucket
response = s3.delete_bucket('gid-requests')

#### Object

In [None]:
# Uploading files
s3.upload_file(Filename='gid_requests_2020_01_01.csv',
               Bucket='gid-requests',
               Key='gid_requests_2020_01_01.csv')

# Listing objects in a bucket
response = s3.list_objects(Bucket='gid-requests',
                           MaxKeys=2,
                           Prefix='gid_requests_2020_')

# Getting first object
response = s3.head_object(Bucket='gid-requests',
                          Key='gid_requests_2020_01_01.csv')

# Downloading files
s3.download_file(Filename='gid_requests_downed.csv',
                 Bucket='gid-requests',
                 Key='gid_requests_2020_01_01.csv')

# Deleting objects
s3.delete_object(Bucket='gid-requests',
                 Key='gid_requests_2020_01_01.csv')

### D. AWS permission system 

(Default: denying permission)

- IAM: control user's access to AWS services, buckets and objects


- bucket policy: control buckets access


- ACL (access control lists): control objects access


- presigned URL: temporary access to an object


Note: **IAM** and **bucket policy** are adequate for multi-user environment.

#### ACL (access control lists)

- private (default)


- public-read

(1) Make files public-read

method 1:

In [None]:
# Set ACL to 'public-read'
s3.put_object_acl(Bucket='gid-requests', 
                  Key='gid_requests_2020_01_01.csv', 
                  ACL='public-read')

method 2:

In [None]:
# Setting ACLs on upload
s3.upload_file(Bucket='gid-requests',
               Filename='potholes.csv',
               Key='potholes.csv',
               ExtraArgs={'ACL':'public-read'})

(2) Accessing public objects

method 1: 

In [None]:
url = https://{bucket}.s3.amazonaws.com/{key}

method 2:

In [None]:
url = "https://{}.s3.amazonaws.com/{}".format("gid-requests", "gid_requests_2020_01_01.csv")

(3) Downloading public files

In [None]:
df = pd.read_csv(url) # access!

(4) Example: Making multiple files public

In [None]:
# List only objects that start with '2020/final_'
response = s3.list_objects(Bucket='gid-staging', Prefix='2020/final_')

# Iterate over the objects
for obj in response['Contents']:
  
    # Give each object ACL of public-read
    s3.put_object_acl(Bucket='gid-staging',
                      Key=obj['Key'], 
                      ACL='public-read')
    
    # Print the Public Object URL for each object
    print("https://{}.s3.amazonaws.com/{}".format('gid-staging', obj['Key']))

#### How access is decided?

requester ---> presigned URL ---(No) ---> policies allow? (IAM、bucket policy、ACL)

#### Accessing private objects in S3

method 1: Download then open

In [None]:
# Download File
s3.download_file(Filename='file_local.csv',
                 Bucket='gid-staging',
                 Key='2020/file_private.csv')

# Read From Disk
pd.read_csv('./file_local.csv')

method 2: Open directly

In [None]:
# Use .get_object()
obj = s3.get_object(Bucket='gid-requests', Key='2020/file.csv')

# Read StreamingBody into Pandas:
pd.read_csv(obj['Body'])

method 3: generate ***pre-signed URLs***  (Expire after a certain timeframe)

In [None]:
# Generate Presigned URL
share_url = s3.generate_presigned_url(ClientMethod='get_object',
                                      ExpiresIn=3600,
                                      Params={'Bucket': 'gid-requests','Key': 'file.csv'})

# Open in Pandas
pd.read_csv(share_url)

Example: Loading multiple files into one DataFrame

In [None]:
# Create list to hold our DataFrames
df_list = []

# Request the list of csv's from S3 with prefix; Get contents
response = s3.list_objects(Bucket='gid-requests', Prefix='2020/')

# Get response contents
request_files = response['Contents']

# Iterate over each object
for file in request_files:
    obj = s3.get_object(Bucket='gid-requests', Key=file['Key'])

    # Read it as DataFrame
    obj_df = pd.read_csv(obj['Body'])

    # Append DataFrame to list
    df_list.append(obj_df)

# Concatenate all the DataFrames in the list
df = pd.concat(df_list)

# Preview the DataFrame
df.head()

#### A summary for sharing URLs

Public files: public object url

- using .format(): 


    'https://{bucket}.s3.amazonaws.com/{key}'


Private files: presigned url

- using .get_presigned_url(): 

     
    'https://s3.amazonaws.com/?AWSAccessKeyId=12345&Signature=rBmnrwutb6VkJ9hE8Uub%2BBYA9mY%'

### E. Sharing files through a website

#### From dataframe to S3

(1) Convert DataFrame to html

In [None]:
# Convert DataFrame to html
df.to_html('table_agg.html')

# Convert DataFrame to html with link
df.to_html('table_agg.html', render_links=True)

# Certain columns to html
df.to_html('table_agg.html',
           render_links=True,
           columns['service_name', 'request_count', 'info_link'])

# Certain columns to html without borders 
df.to_html('table_agg.html',
           render_links=True,
           columns['service_name', 'request_count', 'info_link'],
           border=0)

(2) Uploading an HTML file to S3

In [None]:
s3.upload_file(Filename='./table_agg.html',
               Bucket='website',
               Key='table.html',
               ExtraArgs = {'ContentType': 'text/html','ACL': 'public-read'})

(3) Accessing HTML files
    
    Example: https://website.s3.amazonaws.com/table.html

(4) IANA Media Types

- JSON : application/json


- PNG : image/png


- PDF : application/pdf


- CSV : text/csv


Example: Uploading an image file to S3

In [None]:
s3.upload_file(Filename='./plot_image.png',
               Bucket='website',
               Key='plot_image.png',
               ExtraArgs = {'ContentType': 'image/png', 'ACL': 'public-read'})

#### Generating an index page

In [None]:
# List the gid-reports bucket objects starting with 2020/
r = s3.list_objects(Bucket='gid-reports', Prefix='2020/')

# Convert the response contents to DataFrame
objects_df = pd.DataFrame(r['Contents'])

# Create a column "Link" that contains website url + key
base_url = "http://website.s3.amazonaws.com/"
objects_df['Link'] = base_url + objects_df['Key']

# Write DataFrame to html
objects_df.to_html('report.html',
                   columns=['Link', 'LastModified', 'Size'],
                   render_links=True)

#### Uploading index page

In [None]:
s3.upload_file(Filename='./report.html',
               Bucket='website',
               Key='index.html',
               ExtraArgs = {'ContentType': 'text/html', 'ACL': 'public-read'})