A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
-
Updated
Nov 12, 2024 - Python
A modern data marketplace that makes collaboration among diverse users (like business, analysts and engineers) easier, increasing efficiency and agility in data projects on AWS.
pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, Neptune, OpenSearch, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Example using the Iceberg register_table command with AWS Glue and Glue Data Catalog
An end-to-end data pipeline for De Ftunes’ music purchase analytics, designed to ingest, transform, and model data for efficient analysis of song purchases, user behavior, and service trends. Utilizes AWS Glue, S3, Redshift Spectrum, Apache Airflow, DBT, Superset, and Terraform.
A curated collection of streamlined and effective scripts and tools designed specifically for data engineering tasks.
Process DynamoDB change streams via. AWS Glue w Iceberg to keep a copy of a collection in S3 upto date
Stream CDC into an Amazon S3 data lake in Apache Iceberg table format with AWS Glue Streaming and DMS
Apache Hudi examples designed to be run on AWS Glue via. Glue Jobs
ETL pipeline using AWS services
Streaming ETL job cases in AWS Glue to integrate Delta Lake and creating an in-place updatable data lake on Amazon S3
Streaming ETL job cases in AWS Glue to integrate Iceberg and creating an in-place updatable data lake on Amazon S3
ETL pipeline using AWS services.
An end-to-end data engineering project in which five NYC DOT datasets were modified in an ETL process and analyzed for insights.
Cloud-based AI / ML workflow and data application development framework
Add a description, image, and links to the aws-glue topic page so that developers can more easily learn about it.
To associate your repository with the aws-glue topic, visit your repo's landing page and select "manage topics."