You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Explore essential MapReduce design patterns for big data processing! This repository includes practical implementations of patterns from the "MapReduce Design Patterns" book, complete with examples across summarization, filtering, organization, joins, and more.
This Maven Java project implements three common measures for link prediction in graphs: Common Neighbors, Jaccard Coefficient, and Adamic-Adar. The project leverages the power of Apache Spark to efficiently process large graphs in a distributed environment.
Spring Boot 2 service that forms the analysis tier of the TweetPipe streaming data pipeline. This application consumes tweets from a Kafka topic, analyses them, and will persist the result to a target database.
This project focuses on real-time data streaming with Kinesis, using Flink for advanced processing and OpenSearch for analytics. This architecture has succinctly handled the complete lifecycle of data from ingestion to actionable insights, making it a comprehensive solution.
The Microsoft JDBC Driver for SQL Server is a Type 4 JDBC driver that provides database connectivity with SQL Server through the standard JDBC application program interfaces (APIs).
This project demonstrates how to use Apache Airflow to submit jobs to Apache spark cluster in different programming laguages using Python, Scala and Java as an example.