An end to end data pipeline with Kafka Spark Streaming Integration
-
Updated
Jun 16, 2022 - Java
An end to end data pipeline with Kafka Spark Streaming Integration
Efficiently captures real-time Wikimedia data, like a newsroom for Wikipedia changes. Uses microservices, Kafka, and Spring Boot for reliability and scalability. Ideal for research and analysis.
This is Kafka-Elastic Search pipeline for storing and analyzing server health logs
Realtime metrics calculation pipeline using kafka, elasticsearch and kibana.
Airbyte is an open-source EL(T) platform that helps you replicate your data in your warehouses, lakes and databases.
Data pipeline using Apache Kafka, Apache Spark and HDFS
LinkedIn's previous generation Kafka to HDFS pipeline.
A real-time cryptocurrency data streaming pipeline.
Cloud server data pipeline built with Apache Kafka and Java
CS502Capstone
Toolkit for describing data transformation pipelines by compositing simple reusable components.
Data-processing and common libraries used in main project, all available under Apache 2.0
⚡ 数据集成 | DataLink is a lightweight data integration framework build on top of DataX, Spark and Flink
Real Time Data Streaming Pipeline
Kafka Streams made easy with a YAML file
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
cron replacement to schedule complex data workflows
Flexible development framework for building streaming data applications in SQL with Kafka, Flink, Postgres, GraphQL, and more.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."