Examples of Apache Flink® applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.
-
Updated
Jun 17, 2025 - Java
Examples of Apache Flink® applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.
Automation framework to catalog AWS data sources using Glue
Tool to migrate Delta Lake tables to Apache Iceberg using AWS Glue and S3
This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.
Smart City Realtime Data Engineering Project
It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.
Creating an audit table for a DynamoDB table using CloudTrail, Kinesis Data Stream, Lambda, S3, Glue and Athena and CloudFormation
Working with Glue Data Catalog and Running the Glue Crawler On Demand
Engaging, interactive visualizations crafted with Streamlit, seamlessly powered by Apache Flink in batch mode to reveal deep insights from data.
Prototype of AWS data lake reference implementation written in Python and Spark: https://aws.amazon.com/solutions/implementations/data-lake-solution/
Unveiling job market trends with Scrapy and AWS
Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog
Developed an ETL pipeline for real-time ingestion of stock market data from the stock-market-data-manage.onrender.com API. Engineered the system to store data in Parquet format for optimized query processing and incorporated data quality checks to ensure accuracy prior to visualization.
This project creates a scalable data pipeline to analyze YouTube data from Kaggle using AWS services: S3, Glue, Lambda, Athena, and QuickSight. It processes raw JSON and CSV files into cleansed, partitioned datasets, integrates them with ETL workflows, and catalogs data for querying. Final insights are visualized in QuickSight dashboards.
These are the handwritten notes on Coursera's Practical data science specialization course.
This repository demonstrates the creation of a robust data pipeline using an Orchestrator, on-prem and cloud resources. It collects data from on-premises SQL and NoSQL database and loads it into a SQL database in the cloud.
This project demonstrates how to use Terraform to automate the enablement of Tableflow in a Kafka Topic. Additionally, it shows how to configure Snowflake with Terraform to query the Iceberg Tables as an External Table, using AWS Glue Data Catalog between Confluent Cloud and Snowflake, with an AWS S3 bucket serving as the storage location.
☁️ 🛫 DevOps 30 Days Challenge - Day - 3 : NBA Data Lake using Glue, S3, Python, Athena and CloudFormation
AWS Glue ETL Pipeline automates data extraction, transformation, and loading using AWS Glue and S3. It ingests raw data from an S3 source bucket, processes it via Glue ETL jobs, and stores the transformed data in a destination bucket. This solution enables efficient serverless data processing.
Add a description, image, and links to the aws-glue-data-catalog topic page so that developers can more easily learn about it.
To associate your repository with the aws-glue-data-catalog topic, visit your repo's landing page and select "manage topics."