# S3 and boto3

This small step by step tutorial will guide you to:

- Create an AWS account
- Create an Amazon S3 bucket
- Download and configure the AWS CLI
- Make public the files in the bucket
- Upload your files
- Download the files from the bucket


## Create an S3 Bucket


Amazon Simple Storage Service (Amazon S3) buckets are data lakes where you can store your files. To know more about data lakes check this [website](https://en.wikipedia.org/wiki/Data_lake)

S3 buckets allow you to store up to 5Gb for free, and after that $0.023 per Gb. Take a look at this [page](https://aws.amazon.com/es/s3/pricing/?nc=sn&loc=4) to know more about the S3 pricing.

Let's create an S3 bucket to upload our files. First, go to the AWS [dashboard](https://aws.amazon.com). In the search bar, type 'S3', and click on the first option:
<p align="center"> 
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/aws_search_S3.png?raw=1" width="500"/>
</p>
In the next window, click on 'Create bucket':

<p align="center">
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/create_bucket_button.png?raw=1" width="500"/>
</p>

Set a name for your bucket, and choose a region; any region from the US usually works fine, but make sure to use the same region in the next steps.

## Create an IAM user 

We need to create an Isentity and Access Management (IAM) user to provide the necessary credentials that allow us to interact with the AWS resources.

To create an IAM user, go to the AWS dashboard, and, in the search bar, look for "IAM" and click the first option:

<p align="center">
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/IAM.png?raw=1" width="500"/>
</p>

Next, click User in the left-hand side, and then click 'Add User'

<p align="center">
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/IAM_User.png?raw=1" width="500"/>
</p>

Then fill the user name with the name you want, tick programmatic access, and click Next

In the permissions page, select Attach existing policies directly, tick the AdministratorAccess and then click Next:

<p align="center">
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/Policies.png?raw=1" width="500"/>
</p>

On the next pages, simply click Next and create the user. You will see the next page. This page contains your credentials for connecting to your S3 bucket. These credentials will only show once, so make sure to download the .csv file:

<p align="center">
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/Credentials.png?raw=1" width="500"/>
</p>


## Download and configure AWS CLI


To communicate your computer with your AWS resources, you need to provide the right configurations. The "awscli" package allows us to easily configure the environment variables our computer needs to connect to our AWS services

Let's install awscli using:
`pip install awscli`

Next, in the terminal type `aws configure`
Enter the information as it appears in the .csv file you downloaded in the previous step. 

When you are asked about the region name, go to your S3 bucket and look at the AWS Region of your bucket. The region name looks something like 'us-east-1'

When asked about the output format, you can skip this info by pressing enter.

Now, your computer is ready to use boto3

<details>
  <summary> <font size=+1> Note if you are on Google Colab </font></summary>
  
  If you are using Google Colab, you need to install the awscli as you would do in your local machine. The only difference is that the configuration won't be stored in your next sessions.
  
  To install awscli, type `!pip install awscli` in a new cell.
  
  Then, in the terminal type `!aws configure` and follow the instructions above

</details>


Test that your installation is working by using `aws s3 ls`. You should see something like this:

<p align="center">
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/AWSCLI_ls.png?raw=1" width="500"/>
</p>


# Using boto3 for using your AWS resources from Python

boto3 is a library that allows us to work with AWS from our python script. In this example we are going to simply upload, download and explore S3 buckets, but you can use it to manage other resources such as `EC2`, `RDS`, and `DynamoDB`. You can check boto3's documentation [here](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)

First of all, install boto3 by typing in the terminal `pip install boto3`. Take into account that, in order to use `boto3` you need to have aws configured as we did above.

Let's start by telling to boto3 that we want to use an S3 bucket:

In [None]:
import boto3 
s3_client = boto3.client('s3')



Now, let's upload something to your bucket:

In [None]:
# response = s3_client.upload_file(file_name, bucket, object_name)
response = s3_client.upload_file('cat_0.jpg', 'cat-scraper', 'cat.jpg')


*file_name* is the directory of the file you want to upload, *bucket* is the name of your S3 bucket, and *object_name* is the name you want to give to your file once uploaded


Try it yourself!

Now, let's see the content of the bucket:

In [None]:
import boto3
s3 = boto3.resource('s3')

my_bucket = s3.Bucket('pokemon-sprites')

for file in my_bucket.objects.all():
    print(file.key)



Once you know the content of it, you can download the files:

In [None]:
s3 = boto3.client('s3')

# Of course, change the names of the files to match your own.
s3.download_file('pokemon-sprites', 'zubat/front.png', 'zubat.png')


# Make the files public


In your S3 bucket, disable the 'Block all public access' option:

<p align="center">
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/disable.PNG?raw=1" width="500"/>
</p>

Once you created it, you can access to it in the bucket list, now you just need to make it public.




To make the objects public, go to http://awspolicygen.s3.amazonaws.com/policygen.html, which will help you create the necessary policy.<br>
- In 'Select Type of Policy' select S3 Bucket Policy. 
- In 'Principal' type ' * '
- In 'Actions' select 'Get Object'
- In 'Amazon Resource Name (ARN)' type arn:aws:s3:::{your_bucket_name}/*
- Press Statement
- Press Generate Policy and copy the text

<p align="center">
    <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/Policy_public.png?raw=1" width="500"/>
</p>

Go back to your bucket and go to the Permissions tab. In 'Bucket Policy' click Edit. Paste the text you copied and save changes.<br> 
Now your bucket is publicly accesible, and anyone can download your files. 

In your bucket, select the file you want to download, and copy the Object URL.

<p align="center"> <img src="https://github.com/life-efficient/Data-Engineering/blob/main/6.%20Essential%20Cloud%20Technology/2.%20S3%20and%20boto3/images/URL_public.png?raw=1" width="500"></p>

Open a python editor or notebook and use the requests library to download the image from the URL you just copied. Something like this:

In [None]:
import requests
# Change this with your URL
url = 'https://pokemon-sprites.s3.amazonaws.com/blastoise/front.png'

response = requests.get(url)
with open('blastoise.png', 'wb') as f:
    f.write(response.content)



And that's it! you should be able to see the file in the same working directory.