<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Create-S3-bucket" data-toc-modified-id="Create-S3-bucket-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Create S3 bucket</a></span></li></ul></div>

In this note book, we will create an S3 bucket on AWS and up load our preprocessed data set. We will then be able to simply download these data onto our EC2 instance, which we will use to run the more computationally expensive models.

In [None]:
import bptp3
import uuid

## Create S3 bucket

We will use Boto 3, the AWS SDK for Python, in order to interact with S3 from our notebooks.

In [1]:
# Set up boto3 to interact with S3
s3 = boto3.resource('s3')

Since bucket names need to be globally unique, I will follow https://realpython.com/python-boto3-aws-s3/ to define a function that uses uuid library in order to append a universally unique identifier to the descriptive bucket name.

In [2]:
def create_bucket_name(bucket_prefix):
    """Try to create globally unique bucket name."""
    return(''.join([bucket_prefix, str(uuid.uuid4())]))

bucket_name = create_bucket_name('lending-club-')
print(bucket_name)

lending-club-a7b2c3e3-07f7-4444-b258-5bb63c282398


Now we are ready to create the bucket.

In [3]:
lc_bucket = s3.create_bucket(Bucket=bucket_name)
lc_bucket

s3.Bucket(name='lending-club-a7b2c3e3-07f7-4444-b258-5bb63c282398')

In [4]:
# Print all buckets
for b in s3.buckets.all():
    print(b)

s3.Bucket(name='lending-club-1a6fe642-4a35-4878-a3e1-0fee33683a9f')
s3.Bucket(name='lending-club-23452399-7068-40a5-8b74-889ed5372d4a')
s3.Bucket(name='lending-club-a7b2c3e3-07f7-4444-b258-5bb63c282398')


In [21]:
# # Upload test file
# s3.Bucket(bucket_name) \
#     .upload_file(Filename='data_processed/X_train.joblib',
#                  Key='X_train_small.joblib')

In [8]:
# Upload preprocessed training and test sets to S3 bucket
filenames = ['X_train', 'X_test', 'y_train', 'y_test', 'feature_names']
for filename in filenames:
    s3.Bucket(bucket_name).upload_file(
        Filename=f'data_processed/{filename}.joblib',
        Key=f'{filename}.joblib')

In [22]:
# # Download file
# s3.Object(bucket_name, 'X_train_small.joblib') \
#     .download_file('X_train_small.joblib')

In [22]:
# Download files
filenames = ['X_train', 'X_test', 'y_train', 'y_test', 'feature_names']
for filename in filenames:
    s3.Object(bucket_name, f'{filename}.joblib') \
        .download_file(f'data_processed/{filename}.joblib')