POC in Apache Kafka and Spark Streaming using Avro serialization.
-
Updated
Sep 6, 2018 - Scala
POC in Apache Kafka and Spark Streaming using Avro serialization.
Real-time streaming data pipeline for Twitter Tweets
A big data project to develop a real-time data pipeline for analyzing the popularity and sentiments of trending topics on Twitter.
Model complex data transformation pipelines easily
This project describes how to write full ETL data pipeline using spark.
Data Generators -> Kafka -> Spark Streaming -> PostgreSQL -> Grafana
A cutting-edge big data initiative aimed at creating a real-time data pipeline to analyze the popularity and sentiments of trending topics on Twitter.
Pipeline de dados no Azure para base de imóveis, com estrutura em três camadas (unbound, silver, gold) e trigger automática a cada hora para atualização consistente.
GameTuner Scala Stream Collector is project for collecting raw events from tracker
GameTuner Enricher application for processing raw events
Snowplow Enrichment jobs and library
NebulaGraph Exchange is an Apache Spark application to parse data from different sources to NebulaGraph in a distributed environment. It supports both batch and streaming data in various formats and sources including other Graph Databases, RDBMS, Data warehouses, NoSQL, Message Bus, File systems, etc.
OpenSnowcat Relational Database Loader (Apache 2.0 License)
The leader in Next-Generation Customer Data Infrastructure
OpenSnowcat Enricher (Apache 2.0 License)
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)
Resilient data pipeline framework running on Apache Spark
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."