Skip to content
#

spark-cluster

Here are 24 public repositories matching this topic...

raspberry-pi4-hadoop-spark-cluster

This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.

  • Updated Jul 13, 2024
  • Shell

In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.

  • Updated Oct 31, 2022
  • Java

In this study, we propose to use a distributed storage and computation system in order to track money transfers instantly. In particular, we keep our transaction history in a distributed file system as a graph data structure. We try to detect illegal activities by using Graph Neural Networks (GNN) in distributed manner.

  • Updated Jan 30, 2024
  • Python

Improve this page

Add a description, image, and links to the spark-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-cluster topic, visit your repo's landing page and select "manage topics."

Learn more