aws-redshift

Developed a data pipeline to automate data warehouse ETL by building custom airflow operators that handle the extraction, transformation, validation and loading of data from S3 -> Redshift -> S3

docker airflow data-pipelines aws-redshift

Updated Nov 22, 2021
Python

jackmleitch / StravaDataPipline

Star

🔄 🏃 EtLT of my own Strava data using the Strava API, MySQL, Python, S3, Redshift, and Airflow

python aws airflow sql aws-s3 postgresql data-engineering aws-redshift

Updated Jun 21, 2022
Python

AnMol12499 / Reddit-Analytics-Integration-Platform

Star

Project was based on an interest in Data Engineering, ETL pipeline. It also provided a good opportunity to develop skills and experience in a range of tools. As such, project is more complex than required, utilising dbt, airflow, docker and cloud based storage.

python docker airflow terraform aws-s3 dbt aws-redshift etl-pipeline google-studio

Updated Sep 12, 2023
Python

moritzkoerber / covid-19-data-engineering-pipeline

Star

A Covid-19 data pipeline on AWS featuring PySpark/Glue, Docker, Great Expectations, Airflow, and Redshift, templated in CloudFormation and CDK, deployable via Github Actions.

api docker aws spark apache-spark aws-lambda aws-s3 pyspark aws-ecr aws-cloudformation aws-redshift apache-airflow aws-glue aws-cdk great-expectations

Updated Nov 21, 2023
Python

ismaildawoodjee / aws-data-pipeline

Star

A batch processing data pipeline, using AWS resources (S3, EMR, Redshift, EC2, IAM), provisioned via Terraform, and orchestrated from locally hosted Airflow containers. The end product is a Superset dashboard and a Postgres database, hosted on an EC2 instance at this address (powered down):

python docker aws airflow sql etl terraform aws-s3 postgresql aws-emr data-engineering infrastructure-as-code aws-ec2 aws-iam elt data-pipeline aws-redshift apache-superset

Updated May 14, 2022
Python

vsouza / spark-kinesis-redshift

Star

Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark

python shell aws spark etl spark-streaming aws-kinesis aws-redshift aws-kinesis-stream etl-pipeline

Updated May 22, 2018
Python

kishlayjeet / Zomato-Twitter-Sentiment-Analysis-Data-Pipeline

Star

This project provides valuable customer sentiment insights for Zomato by tracking and analyzing tweets related to their brand and services.

python airflow aws-lambda etl aws-s3 selenium pandas data-engineering nltk psycopg2 boto3 twitter-sentiment-analysis data-pipeline aws-redshift zomato-data-analysis twitter-data-pipeline sentiment-data-pipeline zomato-data-pipeline vedar-lexicon

Updated Aug 27, 2023
Python

twistedFantasy / aws

Star

The goal of this repository is to provide good and clear examples of Amazon CLI commands together with Amazon CDK to easily create any AWS services and resources

python aws-s3 python3 aws-sqs aws-ec2 aws-iam amazon-web-services aws-rds aws-vpc amazon-aws aws-codedeploy aws-route53 aws-elasticsearch aws-redshift aws-parameter-store aws-load-balancer aws-security-group aws-systemmanager amazon-cdk

Updated Dec 22, 2019
Python

FedericoSerini / DEND-Project-3-Data-Warehouse-AWS

Star

Project 3 - Data Engineering Nanodegree

aws aws-s3 data-engineering udacity-nanodegree aws-redshift

Updated Apr 26, 2019
Python

FedericoSerini / DEND-Project-5-Data-Pipelines

Star

Project 5 - Data Engineering Nanodegree

aws aws-s3 data-engineering data-pipelines udacity-nanodegree aws-redshift apache-airflow

Updated Jun 26, 2019
Python

eduardofb / redshift-create-manifest

Star

Redshift script to create a MANIFEST file recursively

redshift aws-redshift redshift-manifest

Updated Jun 7, 2017
Python

eduardofb / redshift-remove-duplicates

Star

Remove duplicates entries from a Redshift cluster

remove-duplicates redshift aws-redshift

Updated May 15, 2017
Python

DimaKuriptya / RedditETL

Star

This project provides a comprehensive data pipeline solution to extract, transform, and load (ETL) Reddit data into a Redshift data warehouse. The pipeline leverages a combination of tools and services including Apache Airflow, Celery, PostgreSQL, Amazon S3, AWS Glue, Amazon Athena, and Amazon Redshift.

python docker redis airflow aws-s3 postgresql pandas celery aws-athena aws-redshift aws-glue

Updated Apr 11, 2024
Python

SaadAhmedWaqar / Data-Warehousing-Redshift

Star

A Data Warehousing project for retail sales using dimension modelling best practices with SCD type 2 on AWS Redshift. Utilizing AWS Lambda, Glue Workflows and Python Shell jobs to create and automate an ELT pipeline where batch data coming into S3 is loaded onto Redshift and necessary transformations are performed to meet requirements.

aws-s3 data-warehousing aws-redshift aws-glue scd-type-2 dimensional-modeling

Updated Aug 10, 2023
Python

lkellermann / sparkify-dw

Star

Udacity Data Engineering Nanodegree Project #3.

aws udacity boto3 datawarehouse aws-redshift

Updated Apr 26, 2021
Python

sagardua297 / udacity-data-engineering-nd

Star

Data Pipeline Analytics Platform is an end-to-end generic Big Data pipeline. Involves following tech stack: AWS S3, AWS Redshift, AWS EMR Cluster, Apache Spark, Apache Airflow.

python airflow spark cassandra aws-s3 data-warehouse data-engineering data-lake data-modeling airflow-plugin aws-redshift etl-pipeline aws-emr-clusters postrgresql airflow-dags airflow-operators

Updated Feb 13, 2021
Python

Huyen-P / DE_DWH_AWS_S3_RedShift

Star

building etl pipelines to migrate music json data/ metadata files (semi-structured data) into a relational database stored in AWS Redshift cluster

python sql vpc cloudshell datawarehouse cmdline aws-redshift

Updated Apr 1, 2024
Python

Improve this page

Add a description, image, and links to the aws-redshift topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the aws-redshift topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws-redshift

Here are 68 public repositories matching this topic...

tokern / piicatcher

aws / amazon-redshift-python-driver

alanchn31 / Movalytics-Data-Warehouse

shravan-kuchkula / udacity-data-eng-proj-1

jackmleitch / StravaDataPipline

AnMol12499 / Reddit-Analytics-Integration-Platform

moritzkoerber / covid-19-data-engineering-pipeline

ismaildawoodjee / aws-data-pipeline

vsouza / spark-kinesis-redshift

kishlayjeet / Zomato-Twitter-Sentiment-Analysis-Data-Pipeline

twistedFantasy / aws

FedericoSerini / DEND-Project-3-Data-Warehouse-AWS

FedericoSerini / DEND-Project-5-Data-Pipelines

eduardofb / redshift-create-manifest

eduardofb / redshift-remove-duplicates

DimaKuriptya / RedditETL

SaadAhmedWaqar / Data-Warehousing-Redshift

lkellermann / sparkify-dw

sagardua297 / udacity-data-engineering-nd

Huyen-P / DE_DWH_AWS_S3_RedShift

Improve this page

Add this topic to your repo