An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.
-
Updated
Jun 23, 2024 - Python
An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.
politician stock market activity web scraping project
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
Banking Data Warehouse Pipeline
Sample code to collect Apache Iceberg metrics for table monitoring
An end-to-end data ingestion pipeline for airline data, utilizing various AWS services to process and store flight information efficiently.
Cloud-based AI / ML workflow and data application development framework
Apache Hudi examples designed to be run on AWS Glue via. Glue Jobs
Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.
This AWS-based data pipeline manages data from storage in S3 data lakes, through transformation with AWS Glue and Lambda, to refined storage in separate S3 repositories. Using Athena for SQL querying and QuickSight for interactive dashboards, this solution optimizes data processing and visualization, facilitating informed decision-making and insigh
Demo code to illustrate the execution of PyTest unit test cases for AWS Glue jobs in AWS CodePipeline using AWS CodeBuild projects
Smart City Realtime Data Engineering Project
Stream CDC into an Amazon S3 data lake in Apache Iceberg format with AWS Glue Streaming and DMS
This project aims to securely manage, streamline, and perform analysis on the structured and semi-structured YouTube videos data based on the video categories and the trending metrics.
End to End Data Engineering Projects
Add a description, image, and links to the aws-glue topic page so that developers can more easily learn about it.
To associate your repository with the aws-glue topic, visit your repo's landing page and select "manage topics."