aws-glue-data-catalog

Here are 20 public repositories matching this topic...

j3-signalroom / apache_flink-kickstarter

Examples of Apache Flink® applications showcasing the DataStream API, Table API in Java and Python, and Flink SQL, featuring AWS, GitHub, Terraform, Streamlit, and Apache Iceberg.

Updated Jun 17, 2025
Java

aws-samples / automated-datastore-discovery-with-aws-glue

Star

Automation framework to catalog AWS data sources using Glue

aws typescript aws-s3 dynamodb glue python3 data-catalog rds gdpr pii data-governance aws-cdk aws-glue-workflow aws-glue-crawler aws-glue-data-catalog

Updated Apr 10, 2025
Python

BahBosque / delta-to-iceberg-aws-glue

Star

Tool to migrate Delta Lake tables to Apache Iceberg using AWS Glue and S3

open-source aws spark data-lake migration-tool apache-iceberg delta-lake aws-glue-data-catalog

Updated May 22, 2025

shiv-rna / Youtube-Data-Engineering-Pipeline

Star

This project repo 📺 offers a robust solution meticulously crafted to efficiently manage, process, and analyze YouTube video data leveraging the power of AWS services. Whether you're diving into structured statistics or exploring the nuances of trending key metrics, this pipeline is engineered to handle it all with finesse.

aws youtube aws-lambda aws-s3 aws-cli data-engineering aws-iam aws-athena aws-glue data-engineering-pipeline aws-quicksight aws-glue-data-catalog

Updated Mar 20, 2024
Python

DivineSamOfficial / SmartCityProject

Star

Smart City Realtime Data Engineering Project

python aws kafka aws-s3 pyspark spark-streaming aws-ec2 aws-athena aws-redshift aws-glue aws-quicksight aws-glue-crawler aws-glue-data-catalog

Updated May 24, 2024
Python

ShubhamMohanty680 / Spotify_end_to_end_data_engineering

Star

It is a project build using ETL(Extract, Transform, Load) pipeline using Spotify API on AWS.

python aws aws-lambda aws-s3 spotify-api data-engineering aws-athena data-engineering-pipeline spotipy-library aws-glue-crawler awscloudwatch aws-glue-data-catalog aws-trigger

Updated Jan 22, 2025
Jupyter Notebook

subhamay-cloudworks / 0090-deutzia-cft

Sponsor

Star

Creating an audit table for a DynamoDB table using CloudTrail, Kinesis Data Stream, Lambda, S3, Glue and Athena and CloudFormation

aws-python-lambda aws-iam aws-cloudformation aws-cloudtrail aws-cloudwatch aws-athena aws-cloudwatch-logs aws-kinesis-stream aws-glue-crawler aws-iam-roles aws-iam-policies aws-s3-bucket aws-glue-data-catalog

Updated Jul 6, 2023
Python

subhamay-cloudworks / 0052-agapanthus-cft

Sponsor

Star

Working with Glue Data Catalog and Running the Glue Crawler On Demand

aws-cloudformation aws-glue aws-glue-crawler aws-iam-roles aws-iam-policies aws-glue-data-catalog

Updated May 11, 2023

j3-signalroom / supercharge_streamlit-apache_flink

Star

Engaging, interactive visualizations crafted with Streamlit, seamlessly powered by Apache Flink in batch mode to reveal deep insights from data.

kafka apache-flink flink iceberg apache-iceberg flink-sql streamlit streamlit-dashboard pyflink aws-glue-data-catalog

Updated Dec 1, 2024
Python

ablange / aws-data-lake

Star

Prototype of AWS data lake reference implementation written in Python and Spark: https://aws.amazon.com/solutions/implementations/data-lake-solution/

python aws sql spark aws-s3 aws-sns aws-cloudformation aws-dynamodb aws-athena aws-lambda-python aws-glue aws-glue-data-catalog

Updated Apr 13, 2025
Python

SadafAsad / LinkedIn-Jobs-Analysis

Star

Unveiling job market trends with Scrapy and AWS

python aws-s3 scrapy aws-ec2 aws-athena aws-quicksight aws-glue-crawler aws-glue-data-catalog

Updated Apr 5, 2024
Python

ev2900 / Iceberg_Glue_register_table

Star

Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog

aws glue iceberg aws-glue apache-iceberg aws-glue-data-catalog

Updated Jun 18, 2025
Python

ShreyasLengade / serverless_etl_pipeline

Star

Developed an ETL pipeline for real-time ingestion of stock market data from the stock-market-data-manage.onrender.com API. Engineered the system to store data in Parquet format for optimized query processing and incorporated data quality checks to ensure accuracy prior to visualization.

aws-lambda aws-s3 data-engineering aws-kinesis aws-glue data-engineering-pipeline aws-glue-crawler aws-grafana aws-glue-data-catalog

Updated Jun 25, 2024
Python

harika-majji / aws-stock-market-analysis

Star

python kafka aws-s3 aws-ec2 aws-glue-crawler aws-glue-data-catalog

Updated Apr 1, 2025
Jupyter Notebook

deept-agl / Youtube-data-ETL-Analysis-using-AWS

Star

This project creates a scalable data pipeline to analyze YouTube data from Kaggle using AWS services: S3, Glue, Lambda, Athena, and QuickSight. It processes raw JSON and CSV files into cleansed, partitioned datasets, integrates them with ETL workflows, and catalogs data for querying. Final insights are visualized in QuickSight dashboards.

aws-lambda athena aws-s3 aws-glue quicksight aws-glue-data-catalog

Updated Jan 25, 2025
Python

omkarfadtare / Practical_data_science

Star

These are the handwritten notes on Coursera's Practical data science specialization course.

aws aws-s3 aws-athena aws-data-wrangler aws-glue-data-catalog

Updated Jul 30, 2024

jibbs1703 / Tickit-Data-Pipeline

Star

This repository demonstrates the creation of a robust data pipeline using an Orchestrator, on-prem and cloud resources. It collects data from on-premises SQL and NoSQL database and loads it into a SQL database in the cloud.

database mongodb aws-s3 data-lake precommit-hooks boto3 aws-redshift etl-pipeline aws-glue aws-glue-crawler medallion-architecture aws-glue-data-catalog

Updated Jun 10, 2025
Python

j3-signalroom / ccaf-tableflow-aws_glue-snowflake-kickstarter

Star

This project demonstrates how to use Terraform to automate the enablement of Tableflow in a Kafka Topic. Additionally, it shows how to configure Snowflake with Terraform to query the Iceberg Tables as an External Table, using AWS Glue Data Catalog between Confluent Cloud and Snowflake, with an AWS S3 bucket serving as the storage location.

snowflake amazon-s3 confluent-kafka confluent-cloud aws-glue-data-catalog confluent-flink confluent-tableflow

Updated Jun 14, 2025
HCL

subhamay-bhattacharyya / dv30w01d03-nba-datalake-py-cft

Star

☁️ 🛫 DevOps 30 Days Challenge - Day - 3 : NBA Data Lake using Glue, S3, Python, Athena and CloudFormation

athena s3-bucket vpc iam-policy internet-gateway iam-role subnets aws-glue secrets-manager nat-gateway route-table aws-glue-crawler aws-glue-data-catalog

Updated Jan 28, 2025

SAGE-Rebirth / aws-glue-sample

Star

AWS Glue ETL Pipeline automates data extraction, transformation, and loading using AWS Glue and S3. It ingests raw data from an S3 source bucket, processes it via Glue ETL jobs, and stores the transformed data in a destination bucket. This solution enables efficient serverless data processing.

aws etl aws-s3 etl-pipeline aws-glue aws-glue-crawler aws-glue-data-catalog

Updated Mar 26, 2025

Improve this page

Add a description, image, and links to the aws-glue-data-catalog topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the aws-glue-data-catalog topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-glue-data-catalog

Here are 20 public repositories matching this topic...

j3-signalroom / apache_flink-kickstarter

aws-samples / automated-datastore-discovery-with-aws-glue

BahBosque / delta-to-iceberg-aws-glue

shiv-rna / Youtube-Data-Engineering-Pipeline

DivineSamOfficial / SmartCityProject

ShubhamMohanty680 / Spotify_end_to_end_data_engineering

subhamay-cloudworks / 0090-deutzia-cft

subhamay-cloudworks / 0052-agapanthus-cft

j3-signalroom / supercharge_streamlit-apache_flink

ablange / aws-data-lake

SadafAsad / LinkedIn-Jobs-Analysis

ev2900 / Iceberg_Glue_register_table

ShreyasLengade / serverless_etl_pipeline

harika-majji / aws-stock-market-analysis

deept-agl / Youtube-data-ETL-Analysis-using-AWS

omkarfadtare / Practical_data_science

jibbs1703 / Tickit-Data-Pipeline

j3-signalroom / ccaf-tableflow-aws_glue-snowflake-kickstarter

subhamay-bhattacharyya / dv30w01d03-nba-datalake-py-cft

SAGE-Rebirth / aws-glue-sample

Improve this page

Add this topic to your repo