# Working with the AWS CLI

to use the AWS CLI inside this container, you first need to authenticate on your host machine to get valid credentials.


Once validated, check, that you have the correct credentials with the following command 

In [None]:
%%bash
aws sts get-caller-identity

If you get an error message "An error occurred (ExpiredToken)" then you need to re-authenticate on your host. the steps are
- `cd`into your project root directory
- locate your ECS-endpoint with `docker compose ps | grep ecs-local-endpoint`
- restart the service with `docker compose restart <service_name>` 

Once successfully authenticated, check that your default region is correct. If not, please change the value in the `.env` file in your project root folder, and set `AWS_DEFAULT_REGION=<region>` to your desired region. The region you specify there is used as the default region for querying and creation / manipulation of resources.

In [None]:
%%bash
aws configure list

## AWS S3 bucket creation

Good practice, when you don't use any IaC solution like AWS Cloud Formation or Terraform, is to create the resources from JSON files. The project template is setup in a way, to use the `aws_objects` folder to store such JSON files.
Main benefits of this approach are:
- code used to create resources is reproducible
- creating similar resources is easy, just copy a JSON and modify what is required

First step is to create a JSON skeleton, that you can modify afterwards for your needs.

In [None]:
%%bash
aws s3api create-bucket --generate-cli-skeleton

use this output and save it to `aws_objects/<bucket_name>.json`. To make it easier for the rest of the tutorial, we use an environment variable to store the S3 Buckets name:

In [None]:
%env S3_BUCKET=test.playground.cdwb.tecalliance.net

Edit the JSON file and add the bucket name, ensure the ACL and LocationContraint  are set to your desired values.  
If you need more information please read [AWS s3api create-bucket](https://docs.aws.amazon.com/cli/latest/reference/s3api/create-bucket.html)

now its time to create the bucket from the JSOn file

In [None]:
%%bash
aws s3api create-bucket --cli-input-json file://../aws_objects/$S3_BUCKET.json

If you do not want to share the data in your bucket with the public, you also need to ensure, that public access is blocked. This time without JSON input, just used the command line only for demonstration purpose.

In [None]:
%%bash
aws s3api put-public-access-block \
    --bucket $S3_BUCKET \
    --public-access-block-configuration "BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true"

## AWS S3 upload

Let's generate some fake data ans save it to a local file that we can upload later on. 

In [None]:
from random import randint, seed

import pandas as pd
from faker import Faker
from faker_music import MusicProvider

# initialize the fake factory
fake = Faker("it_IT")
fake.add_provider(MusicProvider)

data = []

# generate a couple of rows fake data
for i in range(randint(750, 1500)):
    data.append(
        [
            fake.name(),
            fake.music_genre(),
            fake.music_instrument(),
            fake.city(),
            fake.state(),
        ]
    )

# define a dataframe with the columns
df = pd.DataFrame(data, columns=["name", "genre", "instrument", "city", "state"])

# and sore it as csv
df.to_csv(
    "../data/italian_musicans.csv",
    index=False,
)

After the file is created, check the content by navigating in the notebook file browser to the generated file and double click it.

Now we will upload the file to the previously created S3 bucket. We will use [AWS CLI S3 high level commands](https://docs.aws.amazon.com/cli/latest/userguide/cli-services-s3-commands.html) instead of the s3api calls. 

In [None]:
%%bash
aws s3 cp ../data/italian_musicans.csv s3://$S3_BUCKET

## AWS S3 discover

First is to get an overview of all S3 buckets with the associated tags. As there is no build in AWS CLI functionality, a small script is required. Please be patient: depending on the number of buckets in your account it can take some time for the result to display  

The [jq](https://stedolan.github.io/jq/) tool is used to parse the JSON output of the AWS CLI. 
Also [tr](https://en.wikipedia.org/wiki/Tr_(Unix)) is used to replace newline character with a tab.


In [None]:
%%bash
for BUCKET in $(aws s3api list-buckets | jq .Buckets[].Name -r); do
    RESULT=$(aws s3api get-bucket-tagging --bucket $BUCKET 2>&1)
    if [[ $RESULT =~ "(NoSuchTagSet)" ]]; then
        echo $BUCKET
    else
        tags=$(echo $RESULT |jq -c '.[][] | {(.Key): .Value}' | tr '\n' '\t')
        echo $BUCKET '|' $tags
    fi
done

And finally we query the contents of our created S3 bucket. 

In [None]:
%%bash
aws s3 ls s3://$S3_BUCKET