etl-pipeline

Here are 16 public repositories matching this topic...

mehroosali / ABCStoresPipeline

Batch ETL data pipeline built on HDP 3.0 to process daily sales and business data to procedure power Bi reports. Automated the pipelines using Airflow.

mysql airflow scala spark hadoop hadoop-cluster powerbi hadoop-hdfs etl-pipeline airflow-dags

Updated Dec 29, 2021
Scala

eSolutionsTech / datatweak

Star

Data Tweak is a simplified, lightweight ETL framework based on Apache Spark.

processing scala big-data spark etl distributed-computing etl-framework etl-pipeline data-tweak

Updated Jan 26, 2021
Scala

gabyarte / arrival-time-delay-prediction

Star

Arrival delay time prediction of commercial flights (UPM's Master in Data Science project for Big Data subject)

docker machine-learning scala apache-spark upm etl-pipeline

Updated Dec 24, 2022
Scala

kklimexk / spark-playground

Star

Repository for playing with spark

cats scala big-data spark etl functional-programming etl-framework tagless-final higher-kinded-types etl-pipeline cats-free etl-jobs delta-io

Updated Oct 13, 2020
Scala

mihir-tailor / etl-hdfs-hive

Star

Design a batch ETL job using HDFS and Hive

hive hdfs etl-pipeline

Updated May 20, 2020
Scala

ManikHossain08 / STM-Data-Enrichedment-With-Hadoop-Scala

Star

STM data enrichment, Extract, Transform, Load (e.g., ETL)

enrichment scala hadoop cluster hdfs-client etl-pipeline

Updated Jun 11, 2021
Scala

asutoshparida / di-amazon-review

Star

Scala data-pipeline for amazon moview reviews data processing using kafka & spark streaming

elasticsearch lambda scala kafka spark aws-lambda spark-streaming etl-pipeline

Updated Jul 25, 2021
Scala

alisheykhi / Spark-ETL-Pipeline

Star

This project is a tempale for performing etl using Kafka, Spark and hive.

scala kafka spark hive jenkins-pipeline structured-streaming etl-pipeline

Updated Feb 20, 2022
Scala

ManikHossain08 / Realtime-ETL-DataPipeline-Using-Avro_Schema_Registry-Spark-Kafka-HDFS-Hive-Scala

Star

Bigdata processing (Realtime ETL DataPipeline) using Avro Schema Registry, Spark, Kafka, HDFS, Hive, Scala, docker, spark-streaming

scala kafka big-data spark hive avro docker-image spark-streaming kafka-consumer hdfs parquet kafka-producer kafka-streams kafka-broker spark-sql kafka-container etl-pipeline avro-schema-registry

Updated Dec 20, 2021
Scala

giucris / yasp

Star

Yet Another SPark Framework

framework scala big-data spark etl sparksql elt etl-framework etl-pipeline big-data-processing

Updated Feb 5, 2023
Scala

amanjpro / greenish

Star

Data monitoring tool, monitors the result, not the run

data etl monitoring-tool etl-pipeline etl-jobs

Updated Dec 16, 2021
Scala

sparsecode / DaFlow

Star

Apache-Spark based Data Flow(ETL) Framework which supports multiple read, write destinations of different types and also support multiple categories of transformation rules.

json scala csv apache-spark hive hadoop avro etl parquet transformation-rules etl-framework etl-pipeline join-data

Updated Jun 7, 2021
Scala

InterestingLab / seatunnel-example

Star

seatunnel plugin developing examples.

spark spark-streaming flink sql-engine etl-framework waterdrop etl-pipeline

Updated Jan 3, 2022
Scala

EtlFlow is an ecosystem of functional libraries in Scala based on ZIO for running complex Auditable workflows which can interact with Google Cloud Platform, AWS, Kubernetes, Databases, SFTP servers, On-Prem Systems and more.