#

data-pipeline

Here are 713 public repositories matching this topic...

airbytehq / airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Updated Nov 8, 2024
Python

elementary

elementary-data / elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

bigquery snowflake data-warehouse dataops data-analysis redshift dbt data-pipelines data-pipeline lineage data-governance data-lineage analytics-engineer dbt-packages data-observability data-reliability dbt-artifacts

Updated Nov 8, 2024
HTML

go-streams

reugn / go-streams

A lightweight stream processing library for Go

Updated Nov 8, 2024
Go

pipeline-tools / gusty

Making DAG construction easier

airflow data-pipeline data-etl

Updated Nov 8, 2024
Python

superlinked / superlinked

A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal vector embeddings.

python nlp natural-language-processing information-retrieval deep-learning etl retrieval ml embeddings vectorization semantic-search data-pipeline mlops vector-search vector-database llm retrieval-augmented-generation

Updated Nov 8, 2024
Jupyter Notebook

conduit

ConduitIO / conduit

Conduit streams data between data stores. Kafka Connect replacement. No JVM required.

go kafka etl data-stream data-engineering conduit data-integration data-pipeline kafkaconnect

Updated Nov 8, 2024
Go

KhaiHuy123 / analytics-processing-pipeline

[Advanced] - [Python] - Build processing data pipeline using 100 % open sources. The idea for this project comes from one sentence "Turn Your Laptop Into A Personal Analytics Engine"

data-pipeline analytics-dashboard taxi taxi-service newyorkcity analytics-pipeline python-dashboard dagster streamlit-dashboard duckdb python-pipeline dagster-pipeline dagster-project

Updated Nov 8, 2024
Python

rudder-server

rudderlabs / rudder-server

Privacy and Security focused Segment-alternative, in Golang and React

Updated Nov 8, 2024
Go

PFund-Software-Ltd / pfeed

A Data Pipeline for Algo-Trading: Download -> Clean (ETL/ELT) -> Store Data. Supports Various Data Sources. Clean Once and Forget.

data-storage pandas algo-trading data-analysis dataframes historical-data data-pipeline backtesting polars

Updated Nov 8, 2024
Python

apache / shardingsphere

Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.

mysql sql database bigdata postgresql shard distributed-database encrypt data-pipeline data-encryption database-cluster distributed-transaction read-write-splitting database-middleware distributed-sql-database database-gateway

Updated Nov 8, 2024
Java

datazip-inc / olake

Fast & easy way to replicate databases to lakehouses

database replication etl elt cdc data-pipeline change-data-capture lakehouse

Updated Nov 8, 2024
Go

apache / flink-cdc

Flink CDC is a streaming data integration tool

mysql real-time kafka etl postgresql distributed batch data-integration schema-evolution elt flink cdc data-pipeline change-data-capture paimon

Updated Nov 8, 2024
Java

chalk-ai / chalk-go

Go client for Chalk

data feature-engineering data-pipeline

Updated Nov 8, 2024
Go

multiwoven

Multiwoven / multiwoven

🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation

Updated Nov 8, 2024
Ruby

starlake-ai / starlake

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

bigquery spark etl snowflake data-engineering hdfs data-integration redshift synapse data-pipeline

Updated Nov 7, 2024
Scala

chalk-ai / docs

Docs for Chalk AI

data chalk feature-engineering data-pipeline

Updated Nov 7, 2024
MDX

DataSQRL / sqrl

Flexible development framework for building streaming data applications in SQL with Kafka, Flink, Postgres, GraphQL, and more.

api streaming database event-driven-microservices event-driven data-pipeline

Updated Nov 7, 2024
Java

AbsaOSS / pramen

Resilient data pipeline framework running on Apache Spark

scala big-data spark etl hacktoberfest data-pipeline

Updated Nov 7, 2024
Scala

MoritzM00 / proba-forecasting

Probabilistic Timeseries Forecasting Challenge

python data-science data-pipeline uv time-series-forecasting dvc

Updated Nov 7, 2024
Python

silenium-dev / kotlin-flow-graph

Processing data in a graph-like flow

kotlin flow graph data-processing data-pipeline kotlin-coroutines

Updated Nov 7, 2024
Kotlin

Improve this page

Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."