Efficiently captures real-time Wikimedia data, like a newsroom for Wikipedia changes. Uses microservices, Kafka, and Spring Boot for reliability and scalability. Ideal for research and analysis.
-
Updated
Oct 12, 2023 - Java
Efficiently captures real-time Wikimedia data, like a newsroom for Wikipedia changes. Uses microservices, Kafka, and Spring Boot for reliability and scalability. Ideal for research and analysis.
An end to end data pipeline with Kafka Spark Streaming Integration
This is Kafka-Elastic Search pipeline for storing and analyzing server health logs
Realtime metrics calculation pipeline using kafka, elasticsearch and kibana.
Cloud server data pipeline built with Apache Kafka and Java
A real-time cryptocurrency data streaming pipeline.
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Data pipeline using Apache Kafka, Apache Spark and HDFS
LinkedIn's previous generation Kafka to HDFS pipeline.
CS502Capstone
Toolkit for describing data transformation pipelines by compositing simple reusable components.
Data-processing and common libraries used in main project, all available under Apache 2.0
⚡ 数据集成 | DataLink is a lightweight data integration framework build on top of DataX, Spark and Flink
Real Time Data Streaming Pipeline
Kafka Streams made easy with a YAML file
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
cron replacement to schedule complex data workflows
Compiler for streaming data pipelines and data microservices with configurable engines.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."