Efficiently captures real-time Wikimedia data, like a newsroom for Wikipedia changes. Uses microservices, Kafka, and Spring Boot for reliability and scalability. Ideal for research and analysis.
-
Updated
Oct 12, 2023 - Java
Efficiently captures real-time Wikimedia data, like a newsroom for Wikipedia changes. Uses microservices, Kafka, and Spring Boot for reliability and scalability. Ideal for research and analysis.
Realtime metrics calculation pipeline using kafka, elasticsearch and kibana.
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
This is Kafka-Elastic Search pipeline for storing and analyzing server health logs
A real-time cryptocurrency data streaming pipeline.
Cloud server data pipeline built with Apache Kafka and Java
Real Time Data Streaming Pipeline
CS502Capstone
⚡ 数据集成 | DataLink is a lightweight data integration framework build on top of DataX, Spark and Flink
Data-processing and common libraries used in main project, all available under Apache 2.0
An end to end data pipeline with Kafka Spark Streaming Integration
LinkedIn's previous generation Kafka to HDFS pipeline.
Toolkit for describing data transformation pipelines by compositing simple reusable components.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Data pipeline using Apache Kafka, Apache Spark and HDFS
cron replacement to schedule complex data workflows
Kafka Streams made easy with a YAML file
Compiler for streaming data pipelines and data microservices with configurable engines.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."