Incremental Data Load from S3 Bucket to Amazon Redshift Using AWS Glue
-
Updated
Jul 4, 2024 - Python
Incremental Data Load from S3 Bucket to Amazon Redshift Using AWS Glue
This project showcases a data transformation pipeline utilizing AWS Glue and Amazon Athena to process Spotify data from CSV files. It involves loading, transforming, and storing data in an S3 datawarehouse, enabling seamless querying through Amazon Athena.
Transformed YouTube’s raw JSON data to parquet & loaded it in an S3 bucket, used Glue Data Catalog for storing metadata & Athena to query the cleaned data. Developed an ETL process using a Lambda job that would be triggered when raw data is loaded into an S3 bucket, processed, and stored for analytical purposes in an S3 bucket.
This project demonstrates how you can build downstream data pipeline using dbt in athena
Add a description, image, and links to the awsglue topic page so that developers can more easily learn about it.
To associate your repository with the awsglue topic, visit your repo's landing page and select "manage topics."