data-pipelines

Star

Here are 248 public repositories matching this topic...

apache / airflow

Star

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows

Updated Mar 7, 2025
Python

pathwaycom / pathway

Star

Python ETL framework for stream processing, real-time analytics, LLM pipelines, and RAG.

python rust streaming real-time kafka etl machine-learning-algorithms stream-processing data-analytics dataflow data-processing data-pipelines batch-processing pathway iot-analytics etl-framework time-series-analysis

Updated Mar 7, 2025
Python

apache / dolphinscheduler

Star

Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code

workflow airflow job-scheduler orchestration cloud-native task-scheduler data-pipelines azkaban workflow-orchestration workflow-schedule powerful-data-pipelines

Updated Mar 7, 2025
Java

dagster-io / dagster

Star

An orchestration platform for the development, production, and observation of data assets.

python metadata workflow data-science etl analytics scheduler orchestration data-engineering data-integration data-pipelines workflow-automation mlops dagster data-orchestrator

Updated Mar 7, 2025
Python

Unstructured-IO / unstructured

Star

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Updated Mar 7, 2025
HTML

mage-ai / mage-ai

Star

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated Mar 5, 2025
Python

infinyon / fluvio

Star

Lean and mean distributed stream processing system written in rust and web assembly. Alternative to Kafka + Flink in one.

rust distributed-systems streaming real-time serverless webassembly data-flow stream-processing data-analytics data-integration cloud-native data-pipelines stateful streaming-data stream-processing-engine event-driven-architecture streaming-analytics streaming-data-processing streaming-data-pipelines

Updated Mar 7, 2025
Rust

orchest / orchest

Star

Build data pipelines, the easy way 🛠️

python docker kubernetes data-science machine-learning airflow cloud deployment jupyter etl ide pipelines self-hosted jupyterlab notebooks data-pipelines dag etl-pipeline orchest

Updated Jun 6, 2023
TypeScript

elementary-data / elementary

Star

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated Mar 6, 2025
HTML

meltano / meltano

Star

Meltano: the declarative code-first data integration engine that powers your wildest data and ML-powered product ideas. Say goodbye to writing, maintaining, and scaling your own API integrations.

Updated Mar 6, 2025
Python

ucbepic / docetl

Star

A system for agentic LLM-powered data processing and ETL

python workflow data etl elt data-pipelines agents llm

Updated Mar 7, 2025
Python

🐵 Preswald is a framework for building and deploying interactive data apps, internal tools, and dashboards with Python. With one command, you can launch, share, and deploy locally or in the cloud, turning Python scripts into powerful shareable apps.