# Python S3 Example

this notebook builds upon [010_aws_cli](010_aws_cli.ipynb) notebook. It demonstrates a pythonic way how
- list objects in S3 Bucket
- read the data from S3 into DataFrame
- store the data local as a JSON file


## 1. Get default S3 Bucket name
Please ensure that the environment variable is still set. Otherwise set it to the correct value before proceeding.

In [None]:
%env S3_BUCKET

In [None]:
from os import getenv
s3_bucket = getenv('S3_BUCKET', '')
s3_bucket

## 2. List objects in S3 Bucket

In [None]:
import s3fs
s3_instance = s3fs.S3FileSystem(anon=False)
obj_list = s3_instance.find(f"{s3_bucket}", withdirs=False)
obj_list

## 3. Get the content of the first S3 object

CSV only supported

In [None]:
import pandas as pd

result = pd.DataFrame()
for obj_name in obj_list:
    if obj_name.lower().endswith('.csv'):
        result = pd.read_csv(f"s3://{obj_name}")

result

## 4. Save data local as .JSON file


In [None]:
result.to_json("../data/italian_musicians.json", orient="records")

## 5. Save data local as compressed .parquet file


In [None]:
result.to_parquet(
    "../data/italian_musicians.parquet.gzip",
    engine="pyarrow",
    compression="gzip",
    index=False,
)

## 6. compare file sizes 

In [None]:
%%bash 
ls -lS ../data