Data Tweak is a simplified, lightweight ETL framework based on Apache Spark.
-
Updated
Jan 26, 2021 - Scala
Data Tweak is a simplified, lightweight ETL framework based on Apache Spark.
Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.
Bigdata processing (Realtime ETL DataPipeline) using Avro Schema Registry, Spark, Kafka, HDFS, Hive, Scala, docker, spark-streaming
STM data enrichment, Extract, Transform, Load (e.g., ETL)
Repository for playing with spark
Scala data-pipeline for amazon moview reviews data processing using kafka & spark streaming
This project is a tempale for performing etl using Kafka, Spark and hive.
Arrival delay time prediction of commercial flights (UPM's Master in Data Science project for Big Data subject)
Yet Another SPark Framework
Data monitoring tool, monitors the result, not the run
EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.
Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.
seatunnel plugin developing examples.
A simple Spark-powered ETL framework that just works 🍺
A simplified, lightweight ETL Framework based on Apache Spark
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."