aws-glue
Here are 129 public repositories matching this topic...
Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog
-
Updated
Feb 18, 2025 - Python
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
-
Updated
Feb 18, 2025 - Python
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
-
Updated
Feb 18, 2025 - Python
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
-
Updated
Feb 15, 2025 - Python
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
-
Updated
Feb 17, 2025 - Python
The Smart City Data Streaming Pipeline processes real-time data from IoT devices using Apache Kafka for ingestion and Apache Spark for processing. Data is stored in AWS S3 and analyzed with Glue, Athena, and Redshift. It enhances traffic management, predictive analytics, and urban planning, making cities smarter and more efficient.
-
Updated
Feb 9, 2025 - Python
This repository houses the setup_nba_data_lake.py script, which automates the entire process of building a cloud-based data lake for NBA analytics. With this script, you can seamlessly integrate Amazon S3, AWS Glue, and Amazon Athena to store, process, and query NBA-related data—all in a fully scalable and serverless environment!
-
Updated
Feb 8, 2025 - Python
Automating the building of an NBA Sports Data Lake by leveraging AWS S3, AWS Glue, and AWS Athena and set up an infrastructure to store and query NBA-related data.
-
Updated
Feb 3, 2025 - Python
AWS ETL pipeline that curates data that's used to train machine learning models
-
Updated
Jan 31, 2025 - Python
NBA Data Lake Repository contains the setup_nba_data_lake.py script, which automates the creation of a data lake for NBA analytics using AWS services. The script integrates Amazon S3, AWS Glue, and Amazon Athena, and sets up the infrastructure needed to store and query NBA-related data.
-
Updated
Jan 29, 2025 - Python
This project delivers an end-to-end data pipeline solution designed to employ a comprehensive ETL process to move BIM data from Autodesk Revit model of Tencent Global Headquarters into cloud storage for processing and and analytics. The pipeline leverages tools and services such as Apache Airflow, Amazon S3, AWS Glue, and Amazon Redshift.
-
Updated
Jan 27, 2025 - Python
This project creates a scalable data pipeline to analyze YouTube data from Kaggle using AWS services: S3, Glue, Lambda, Athena, and QuickSight. It processes raw JSON and CSV files into cleansed, partitioned datasets, integrates them with ETL workflows, and catalogs data for querying. Final insights are visualized in QuickSight dashboards.
-
Updated
Jan 25, 2025 - Python
A data engineering project to extract, transform, and load LeetCode contest ranking and contest problems data
-
Updated
Jan 20, 2025 - Python
This project automates the creation of a data lake for NBA analytics using AWS services
-
Updated
Jan 15, 2025 - Python
Improve this page
Add a description, image, and links to the aws-glue topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the aws-glue topic, visit your repo's landing page and select "manage topics."