There are two examples
- PyAthena
- aws-data-wrangler
They both have differences in how to query.
In addition there is a an example of aws-data-wrangler saving directly to the glue catalog, this is useful when you have the data is a dataframe and you have the the backend sources setup for a blue/green on/off. You do not need a crawler to crawler the bucket and you can switch the backend data source from blue/green during the save to not have to deal with blue/green buckets. Instead you database would always point to the latest color, but the data behind the scenes would reside within the blue or green bucket.
From PyAthena documentation, don't forget to set the environment variables as appropriate for aws_access_key_id_athena and aws_secret_access_key.
Either set the aws_access_key_id_athena and aws_secret_access_key and update the code or use the profile variable, currently set to default.
- Install a virtual environment
python3 -m venv venv- Activate virtual environment
- run
sh dev-setup.sh
#Mac
source ./venv/bin/activate
#Windows
source .\scripts\activate- Install the pip requirements.
pip install -r requirements.txt-
Update the variables in the script you want to use.
s3_staging_dir,database,table_name, etc. -
Run
- vscode
python pyathena/example.pyPython formatting can be done by two well known options,
- black
- yapf - used in the project
For import formating:
- isort - globally install