data-engineering

Star

Here are 1,176 public repositories matching this topic...

jijo-james / data-engineering-pet-projects

Star

This repo is my experimental projects on Data Engineering.

python airflow sql etl data-engineering

Updated Mar 6, 2023
Python

leonidee / spark-hadoop-automation-in-cloud

Star

Automate Apache Spark in Hadoop with Airflow in Cloud

airflow apache-spark hadoop data-engineering

Updated Jul 16, 2023
Python

LoveNui / EMR-AWS-APACHE-SPARK

Star

aws airflow big-data spark data-engineering data-analysis

Updated Jul 15, 2023
Python

khushal2405 / ETL-pipeline-using-Airflow-and-AWS-EMR

Star

We Build an ETL pipeline using Airflow that accomplishes the following: Downloads data from an AWS S3 bucket, Runs a Spark/Spark SQL job on the downloaded data producing a cleaned-up dataset of delivery deadline missing orders and then Upload the cleaned-up dataset back to the same S3 bucket in a folder primed for higher level analytics

python aws airflow scala spark apache-spark etl s3 s3-bucket aws-emr pyspark data-engineering

Updated Feb 25, 2023
Python

mukmookk / streamDAQ

Star

real time nasdaq data pipeline

python data-engineering webcrawling

Updated Aug 15, 2023
Python

leonardohss0 / etl-sql-s3-redshift

Star

Keywords: Python, Airflow, AWS, S3, Redshift, ETL

airflow etl data-engineering

Updated Apr 29, 2023
Python

lucasbalponti / Apache-Airflow---Pipeline-de-dados

Star

pipeline data-engineering dag apache-airflow vitrinedev

Updated Apr 26, 2023
Python

juliaobenauer / Data-Pipelines-with-Airflow

Star

Udacity project within the Data Engineer Nanodegree

python airflow sql etl data-engineering

Updated Nov 26, 2022
Python

imbrito / pyspark-calculates-session

Star

PySpark Analysis from log files

python data-structure spark bigdata pyspark data-engineering data-analytics

Updated Nov 11, 2022
Python

splovyt / LymeDatabase

Star

Constructing a protein fragment database in the context of Lyme disease.

bioinformatics pipeline data-engineering healthcare webapp

Updated Dec 26, 2018
Python

exaxorg / pycon20-nyc-taxi-covid-movie

Star

Create an animation from the NYC Taxi dataset

data-science big-data data-engineering reproducibility parallel-processing taxi-data

Updated Nov 12, 2020
Python

anshajk / prefect-boilerplate

Star

A template to quickly set up the prefect workflow orchestration engine.

python data-engineering prefect

Updated Oct 28, 2020
Python

jomavera / datapipelineDataproc

Star

ETL pipeline with PySpark on Dataproc for data lake on Google Cloud Storage

google-cloud pyspark data-engineering data-pipeline dataproc

Updated Mar 17, 2021
Python

tsamba120 / Spotify-ETL-Job-with-Airflow

Star

This is a data engineering-focused project that used Python, SQL, and Airflow to perform an ETL job of my Spotify listening data and send me an automated email of my weekly listening habits.

python airflow postgresql data-engineering etl-pipeline

Updated Sep 6, 2021
Python

LeeTaylorLondon / Deep-FeedForward-Neural-Network

Star

Deep feed forward neural network predicting taxi fare prices. Project features data & feature engineering.

python machine-learning tensorflow numpy data-engineering feature-engineering taxi-fare

Updated Jul 14, 2021
Python

canhlong18 / Collect-and-analyze-national-high-school-exam-scores

Star

Scraw data from websites > Python

python data-science data-mining python3 data-engineering

Updated Feb 16, 2023
Python

nikolatechie / Spotify-Playlist

Star

Data pipeline that fetches recently played songs in the past 24 hours using Spotify API and saves the data in the SQLite database. Scheduled to run daily using Apache Airflow.

python api spotify data sql sqlite data-engineering apache-airflow