data-pipeline

Here are 310 public repositories matching this topic...

airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Updated May 31, 2024
Python

lanavirsen / Google-Web-Analytics-Data-Pipeline

Star

Automates the collection, transformation, and presentation of web analytics data from Google Analytics 4 and Google Search Console into Google Sheets for streamlined reporting and analysis.

python api automation web-analytics data-integration data-pipeline

Updated May 30, 2024
Python

waqarg2001 / Youtube-Data-Pipeline-AWS

Star

Leveraging AWS Cloud Services, an ETL pipeline transforms YouTube video statistics data. Data is downloaded from Kaggle, uploaded to an S3 bucket, and cataloged using AWS Glue for querying with Athena. AWS Lambda and Glue converts to Parquet format and stores it in a cleansed S3 bucket. AWS QuickSight then visualizes the materialised data.

python aws spark aws-lambda etl aws-s3 pandas pyspark data-engineering aws-iam aws-cloudwatch data-pipeline etl-pipeline aws-glue data-engineering-workflows data-engineering-pipeline aws-lambda-layers aws-data-engineering-project data-engineering-project

Updated May 30, 2024
Python

ddeutils / ddeutil-workflow

Star

DDE Workflow Utility Objects

data utilities data-pipeline

Updated May 30, 2024
Python

AgnostiqHQ / covalent

Star

Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.

Updated May 30, 2024
Python

josephmachado / efficient_data_processing_spark

Star

Code for "Efficient Data Processing in Spark" Course

apache-spark pyspark data-engineering minio data-pipeline pyspark-notebook

Updated May 29, 2024
Python

AqueductHub / aqueductcore

Star

Aqueduct Core is responsible for the core functionality of Aqueduct, an experiment management system.

quantum-computing software data-pipeline experiment-control

Updated May 30, 2024
Python

bookbot-kids / speechline

Star

An end-to-end, offline, audio categorization, transcription, and segmentation.

data-pipeline speech-labeller audio-labeling

Updated May 28, 2024
Python

abeltavares / etl-pulumi-aws-snowflake

Star

ETL pipeline using Pulumi, AWS services, and Snowflake for automated data flow.

Updated May 26, 2024
Python

bruin-data / ingestr

Star

ingestr is a CLI tool to copy data between any databases with a single command seamlessly.

bigquery postgresql snowflake mssql data-integration data-pipeline data-ingestion copy-database ingestion-pipeline duckdb

Updated May 25, 2024
Python

digitalghost-dev / premier-league

Star

A Data Engineering project. Repository for backend infrastructure and Streamlit app files for a Premier League Dashboard.

python go docker bigquery google-cloud data-visualization data-pipeline data-engineer firestore prefect cloud-run streamlit

Updated May 25, 2024
Python

buchananja / dpyp

Star

A convenience tool for small-scale data pipelines in Python

data-science data pipeline pandas data-engineering data-analysis data-preprocessing data-processing data-cleaning data-pipeline

Updated May 24, 2024
Python

night-fury-me / Methods-of-Advanced-Data-Engineering

Star

A repository for the Methods of Advanced Data Engineering course at FAU

data-science etl data-engineering data-analysis data-pipeline

Updated May 23, 2024
Python

behnamyazdan / DockerForDataEngineers

Star

This repository hosts materials for the Docker for Data Engineers workshop, offering hands-on exercises and resources tailored for data engineering professionals.

dockerfile docker-compose data-engineering data-pipeline dokcer docker-for-data-enfineers

Updated May 23, 2024
Python

vishalbansal28 / End-to-end-realtime-data-streaming

Star

An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.

docker big-data cassandra apache-spark data-storage postgresql data-engineering apache-kafka data-processing data-pipeline real-time-analytics containerization apache-zookeeper apache-airflow etl-pipeline

Updated May 23, 2024
Python

scribe-org / Scribe-Data

Star

Wikidata and Wikipedia language data extraction

Updated May 20, 2024
Python

ahnazary / dagster

Star

Streamlined Data Pipelining with Dagster: Advanced Solutions for Robust and Observable Data Workflows.

python data database data-pipeline dagster

Updated May 17, 2024
Python

snowplow / dbt-snowplow-normalize

Star

A dbt package to support modelling event data via split tables for use in downstream tools and systems.

analytics data-model dbt data-pipeline snowplow-analytics

Updated May 14, 2024
Python

PATRICIAJUNQUEIRA / Airflow_Pipeline_Gera_Pasta

Star

Pipeline de dados automatizado para extração e armazenamento de previsões meteorológicas para o setor de turismo.

python api airflow tourism data-engineering weather-forecast data-pipeline data-engineering-pipeline

Updated May 13, 2024
Python

tobiasodion / trending-movies-data-pipeline

Star

Project involved the development of a data pipeline using airflow and python. The data pipeline ingested trending movies' and distributors' data from imdb and box office, cleansed, formatted, combined and indexed the data on elastic search. Also, a dashboard was created from the data using kibana analytics. The tools and libraries used in this p…

elasticsearch data-science airflow selenium pyspark parquet data-pipeline

Updated May 13, 2024
Python

Improve this page

Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-pipeline

Here are 310 public repositories matching this topic...

airbytehq / airbyte

lanavirsen / Google-Web-Analytics-Data-Pipeline

waqarg2001 / Youtube-Data-Pipeline-AWS

ddeutils / ddeutil-workflow

AgnostiqHQ / covalent

josephmachado / efficient_data_processing_spark

AqueductHub / aqueductcore

bookbot-kids / speechline

abeltavares / etl-pulumi-aws-snowflake

bruin-data / ingestr

digitalghost-dev / premier-league

buchananja / dpyp

night-fury-me / Methods-of-Advanced-Data-Engineering

behnamyazdan / DockerForDataEngineers

vishalbansal28 / End-to-end-realtime-data-streaming

scribe-org / Scribe-Data

ahnazary / dagster

snowplow / dbt-snowplow-normalize

PATRICIAJUNQUEIRA / Airflow_Pipeline_Gera_Pasta

tobiasodion / trending-movies-data-pipeline

Improve this page

Add this topic to your repo