Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.
-
Updated
Dec 29, 2021 - Scala
Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.
Data Tweak is a simplified, lightweight ETL framework based on Apache Spark.
Arrival delay time prediction of commercial flights (UPM's Master in Data Science project for Big Data subject)
Repository for playing with spark
STM data enrichment, Extract, Transform, Load (e.g., ETL)
Scala data-pipeline for amazon moview reviews data processing using kafka & spark streaming
This project is a tempale for performing etl using Kafka, Spark and hive.
Bigdata processing (Realtime ETL DataPipeline) using Avro Schema Registry, Spark, Kafka, HDFS, Hive, Scala, docker, spark-streaming
Yet Another SPark Framework
Data monitoring tool, monitors the result, not the run
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
seatunnel plugin developing examples.
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
A simple Spark-powered ETL framework that just works 🍺
A simplified, lightweight ETL Framework based on Apache Spark
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."