dataengineering

Here are 11 public repositories matching this topic...

BobErgot / Large-Scale-Data-Processing-Design-Patterns

Explore essential MapReduce design patterns for big data processing! This repository includes practical implementations of patterns from the "MapReduce Design Patterns" book, complete with examples across summarization, filtering, organization, joins, and more.

java hadoop bigdata datascience mapreduce designpatterns dataprocessing dataengineering cloudcomputing bigdataanalytics distributedcomputing

Updated Apr 6, 2024
Java

panastasiadis / link-prediction-with-spark

Star

This Maven Java project implements three common measures for link prediction in graphs: Common Neighbors, Jaccard Coefficient, and Adamic-Adar. The project leverages the power of Apache Spark to efficiently process large graphs in a distributed environment.

spark maven bigdata datascience machinelearning link-prediction dataengineering javadevelopment jaccard-coefficient adamic-adar common-neighbors

Updated Feb 26, 2023
Java

shaunmcglinchey / tweetpipe-analysis

Star

Spring Boot 2 service that forms the analysis tier of the TweetPipe streaming data pipeline. This application consumes tweets from a Kafka topic, analyses them, and will persist the result to a target database.

data streaming kafka spring-boot etl streaming-data dataengineering etl-pipeline

Updated Feb 26, 2022
Java

Jhonnatan7br / SQL_DB-with-Apache-Spark---JDBC-frameworks-

Star

Data Engineering ELT - Connecting SQL database with Apache Spark through Java Database connectivity

java data sql spark apache elt dataengineering

Updated Jan 24, 2024
Java

TanishkaMarrott / Real-Time-Streaming-Analytics-with-Kinesis-Flink-and-OpenSearch

Star

This project focuses on real-time data streaming with Kinesis, using Flink for advanced processing and OpenSearch for analytics. This architecture has succinctly handled the complete lifecycle of data from ingestion to actionable insights, making it a comprehensive solution.

opensearch dataengineering cloudcomputing awslambda kinesisdatastreams apacheflink awsglue realtimeanalytics

Updated May 9, 2024
Java

shaunmcglinchey / tweetpipe-kinesis-producer

Star

Tweetpipe Kinesis Producer Java application

java aws streaming etl maven kinesis stream-processing kinesis-stream amazon-web-services dataengineering etl-pipeline

Updated Dec 17, 2020
Java

Credify / mssql-jdbc

Star

The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database connectivity with SQL Server through the standard JDBC application program interfaces (APIs).

dataengineering jenkins--internal mergebot--backend managed-by-terraform

Updated Apr 25, 2024
Java

raunak-r / kafka-springboot-demo

Star

A demo project to show basic API to connect to Nifi, receive data in Kafka together with Springboot.

java demo kafka springboot nifi esper dataengineering

Updated Jun 5, 2021
Java

airscholar / SparkingFlow

Star

This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.

java docker scala spark pyspark dataengineering apache-airflow