data-pipeline

Star

Here are 313 public repositories matching this topic...

jomavera / datapipelineDataproc

Star

ETL pipeline with PySpark on Dataproc for data lake on Google Cloud Storage

google-cloud pyspark data-engineering data-pipeline dataproc

Updated Mar 17, 2021
Python

Anshumaan-Chauhan02 / Scalability-Check-for-Machine-Learning-System-Predicting-Flight-Delays

Star

Checking the scalability of a data pipeline involving MySQL, Spark and Machine Learning Models using Latency.

mysql python machine-learning spark scalability data-pipeline

Updated Jul 10, 2023
Python

deepankarvarma / Extract-Transform-Load-Process-Techniques

Sponsor

Star

This repository contains code for comparing the performance of three different ELT (Extract, Load, Transform) methods on CSV files of different sizes. The three methods are implemented in Python using different approaches and libraries, and their execution times are compared and plotted for analysis.

python data-science csv multiprocessing multithreading sqlite3 data-processing data-pipeline extract-transform-load etl-pipeline producer-consumer-problem

Updated May 3, 2023
Python

JanneImmonen / FootballDataConnector

Star

A custom Airbyte connector to fetch football data from the Football-Data.org API. It allows users to retrieve match results, league tables, and player statistics for specific leagues, making it a versatile tool for football data analysis.

python api connector etl football-data data-analysis data-integration data-pipeline sport-analytics airbyte

Updated Aug 13, 2023
Python

HBPMedical / data-factory-airflow-dags

Star

DAGs adapting the MRI preprocessing pipeline to Airflow

airflow mri neuroimaging data-pipeline nitfi data-factory

Updated Feb 5, 2018
Python

vitorpbarbosa7 / kedro

Star

Kedro tests

data-science data-engineering data-pipeline modularization mlops kedro

Updated May 6, 2023
Python

EmileDqy / jupyter_airflow_monitoring

Star

An extension enabling the monitoring of Apache Airflow DAGs directly from Jupyter notebooks. Tailored for developers and data scientists, it simplifies tracking specific DAGs, reduces unnecessary friction, and allows severity levels setup for failed DAGs.

python workflow data-science airflow monitoring jupyter jupyter-notebook dag data-pipeline workflow-automation apache-airflow jupyter-extension notebook-jupyter

Updated May 29, 2023
Python

Ashish25 / DEF1R

Star

azure pyspark databricks data-pipeline

Updated Dec 21, 2023
Python

shrikantnaidu / Data-Pipelines-with-Airflow

Star

Data Pipelines with Airflow

aws airflow data-pipeline aws-redshift etl-pipeline

Updated Apr 21, 2024
Python

night-fury-me / Methods-of-Advanced-Data-Engineering

Star

A repository for the Methods of Advanced Data Engineering course at FAU

data-science etl data-engineering data-analysis data-pipeline

Updated May 23, 2024
Python

jomavera / dataPipeline

Star

ETL pipeline with AWS Redshift orchestrated with Airflow

data-warehouse data-engineering data-pipeline aws-redshift apache-airflow

Updated Mar 10, 2021
Python

ThomasJewson / chess-data-pipeline

Star

Data pipeline to gather data from chess website APIs using Airflow.

python airflow chess data-pipeline

Updated Dec 18, 2022
Python

jason-cls / cryptoscout

Star

An end-to-end data pipeline deployed on GCP that extracts cryptocurrency data for analytics.

python docker sql apache-spark terraform gcp data-visualization data-engineering dbt data-pipeline apache-airflow

Updated Jun 10, 2023
Python

Shreyas-s14 / McFlAi-OTPMS

Star

The mini project for the course Database Technologies. The task is to take in data via a pipeline built using spark-streaming and kafka, and store the processed data into a SQLite database for further manipulation

kafka stream-processing spark-streaming sqlite3 data-pipeline

Updated May 3, 2023
Python

jack-white9 / openpowerlifting-data-pipeline

Star

Deployable AWS data platform to process powerlifting data extracted from openpowerlifting.org.

python docker airflow terraform data-pipeline

Updated Mar 3, 2024
Python

maxxx580 / glassdoor-reviews-nlp-data-pipeline

Star

a pyspark-based data cleaning pipeline for glassdoor reviews

nlp spark pyspark data-pipeline

Updated Dec 27, 2019
Python

romnn / kaggle-brane

Star

package for simple interaction with the Kaggle API for brane data pipelines.

api wrapper package kaggle datasets cicd competitions data-pipeline brane

Updated Jun 3, 2021
Python

chaitanyakasaraneni / dataPipelinesWithAirFlow

Star

Data Pipeline with Airflow Project from Udacity Data Engineer Nanodegree

airflow data-engineering data-pipeline apache-airflow

Updated Jun 24, 2021
Python

suvayu / genome-genie

Star

A toolset for data pipelines in Genomics

genomics sge variant-calling dask qsub batch-processing data-pipeline pbs batch-systems

Updated Mar 24, 2019
Python

aluxh / carpark-sg-data-pipeline

Star

The goal of this project is to build data pipeline for gathering real-time carpark lots availability and weather datasets from Data.gov.sg. These data are extracted via API, and stored them in the S3 bucket before ingesting them into the Data Warehouse.

airflow redshift data-pipeline aws-redshift carpark carpark-sg carpark-availability

Updated Sep 1, 2019
Python

Improve this page

Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-pipeline

Here are 313 public repositories matching this topic...

jomavera / datapipelineDataproc

Anshumaan-Chauhan02 / Scalability-Check-for-Machine-Learning-System-Predicting-Flight-Delays

deepankarvarma / Extract-Transform-Load-Process-Techniques

JanneImmonen / FootballDataConnector

HBPMedical / data-factory-airflow-dags

vitorpbarbosa7 / kedro

EmileDqy / jupyter_airflow_monitoring

Ashish25 / DEF1R

shrikantnaidu / Data-Pipelines-with-Airflow

night-fury-me / Methods-of-Advanced-Data-Engineering

jomavera / dataPipeline

ThomasJewson / chess-data-pipeline

jason-cls / cryptoscout

Shreyas-s14 / McFlAi-OTPMS

jack-white9 / openpowerlifting-data-pipeline

maxxx580 / glassdoor-reviews-nlp-data-pipeline

romnn / kaggle-brane

chaitanyakasaraneni / dataPipelinesWithAirFlow

suvayu / genome-genie

aluxh / carpark-sg-data-pipeline

Improve this page

Add this topic to your repo