aws-glue

pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

Updated Feb 18, 2025
Python

aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue

Star

Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS

apache-spark aws-athena aws-glue aws-dms apache-iceberg

Updated Feb 15, 2025
Python

data-dot-all / dataall

Star

A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.

aws data-science data aws-s3 redshift etl-framework aws-glue aws-lake-formation lakehouse lakeformation

Updated Feb 17, 2025
Python

froghollow / simple_data_quality

Star

Demonstrates how to utilize PyDeequ with Glue ETL and Data Quality Definition Language (DQDL)

aws-glue pydeequ

Updated Feb 13, 2025
Python

Tejesvani / End-to-End-Smart-City-Data-Streaming-Pipeline

Star

The Smart City Data Streaming Pipeline processes real-time data from IoT devices using Apache Kafka for ingestion and Apache Spark for processing. Data is stored in AWS S3 and analyzed with Glue, Athena, and Redshift. It enhances traffic management, predictive analytics, and urban planning, making cities smarter and more efficient.

apache-spark aws-s3 apache-kafka amazon-redshift aws-athena aws-glue

Updated Feb 9, 2025
Python

CloudFay / Sports-Data-Lake

Star

This repository houses the setup_nba_data_lake.py script, which automates the entire process of building a cloud-based data lake for NBA analytics. With this script, you can seamlessly integrate Amazon S3, AWS Glue, and Amazon Athena to store, process, and query NBA-related data—all in a fully scalable and serverless environment!

aws-s3 nba-analytics amazon-athena aws-glue

Updated Feb 8, 2025
Python

dashmug / glue-utils

Star

Python library designed to enhance the developer experience when working with AWS Glue ETL and Python Shell jobs by reducing boilerplate code, increasing type safety, and improving IDE auto-completion.

python aws spark etl pyspark data-engineering elt aws-glue

Updated Feb 6, 2025
Python

BrianWangila / Sports-Data-Lake-AWS

Star

Automating the building of an NBA Sports Data Lake by leveraging AWS S3, AWS Glue, and AWS Athena and set up an infrastructure to store and query NBA-related data.

python aws aws-s3 aws-athena aws-glue

Updated Feb 3, 2025
Python

SharaiS / Stedi_analytics_projectAWS

Star

AWS ETL pipeline that curates data that's used to train machine learning models

aws-s3 aws-athena aws-glue python-spark

Updated Jan 31, 2025
Python

dominique-jacque / NBA-Data-Lake

Star

NBA Data Lake Repository contains the setup_nba_data_lake.py script, which automates the creation of a data lake for NBA analytics using AWS services. The script integrates Amazon S3, AWS Glue, and Amazon Athena, and sets up the infrastructure needed to store and query NBA-related data.

api s3 iam data-lake cloudshell amazon-athena aws-glue

Updated Jan 29, 2025
Python

siconge / Tencent-HQ-BIM-Data-Pipeline-with-AWS

Star

This project delivers an end-to-end data pipeline solution designed to employ a comprehensive ETL process to move BIM data from Autodesk Revit model of Tencent Global Headquarters into cloud storage for processing and and analytics. The pipeline leverages tools and services such as Apache Airflow, Amazon S3, AWS Glue, and Amazon Redshift.

pandas pyspark aws-cloudformation amazon-redshift amazon-s3 apache-airflow etl-pipeline aws-glue building-information-modelling autodesk-platform-services

Updated Jan 27, 2025
Python

BhawnaMehbubani / Ingest-daily-flight-data-in-Redshift-fact-table

Star

End-to-end ETL pipeline for flight data analytics using AWS Glue, Redshift, S3, PySpark, and Athena, with data transformation, enrichment, and reporting capabilities.

athena s3-bucket pyspark redshift aws-glue

Updated Jan 25, 2025
Python

deept-agl / Youtube-data-ETL-Analysis-using-AWS

Star

This project creates a scalable data pipeline to analyze YouTube data from Kaggle using AWS services: S3, Glue, Lambda, Athena, and QuickSight. It processes raw JSON and CSV files into cleansed, partitioned datasets, integrates them with ETL workflows, and catalogs data for querying. Final insights are visualized in QuickSight dashboards.

aws-lambda athena aws-s3 aws-glue quicksight aws-glue-data-catalog

Updated Jan 25, 2025
Python

ccao-data / model-sales-val

Star

Heuristics for detecting outlier and non-arms-length sales

python model aws-s3 aws-glue

Updated Feb 6, 2025
Python

minhduc29 / leetcode-contest-analytics

Star

A data engineering project to extract, transform, and load LeetCode contest ranking and contest problems data

aws leetcode etl analytics pandas data-engineering elt dag amazon-redshift data-pipeline amazon-s3 apache-airflow aws-glue leetcode-contest

Updated Jan 20, 2025
Python

zablon-oigo / nba-data-lake

Star

This project automates the creation of a data lake for NBA analytics using AWS services

aws-s3 python3 aws-iam aws-athena aws-glue github-actions boto3-script

Updated Jan 15, 2025
Python

Improve this page

Add a description, image, and links to the aws-glue topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the aws-glue topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-glue

Here are 129 public repositories matching this topic...

ev2900 / Iceberg_update_metadata_script

ev2900 / Iceberg_Glue_register_table

ev2900 / MongoDB_Streams_Glue_Iceberg

ev2900 / Glue_Examples

aws / aws-sdk-pandas

aws-samples / transactional-datalake-using-apache-iceberg-on-aws-glue

data-dot-all / dataall

froghollow / simple_data_quality

Tejesvani / End-to-End-Smart-City-Data-Streaming-Pipeline

CloudFay / Sports-Data-Lake

dashmug / glue-utils

BrianWangila / Sports-Data-Lake-AWS

SharaiS / Stedi_analytics_projectAWS

dominique-jacque / NBA-Data-Lake

siconge / Tencent-HQ-BIM-Data-Pipeline-with-AWS

BhawnaMehbubani / Ingest-daily-flight-data-in-Redshift-fact-table

deept-agl / Youtube-data-ETL-Analysis-using-AWS

ccao-data / model-sales-val

minhduc29 / leetcode-contest-analytics

zablon-oigo / nba-data-lake

Improve this page

Add this topic to your repo