A spark cluster based on docker-compose.
-
Updated
Mar 23, 2018 - Shell
A spark cluster based on docker-compose.
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
A spark cluster containing multiple spark masters based on docker-compose.
Spark standalone architecture, local architecture and reading hadoop file formats i.e. avro, parquet and ORC
docker spark standalone
Script to run and find similarities between movies from a movie lens data set using Python & Spark Clustering.
A distributed application to identify top 50 taxi pickup locations in New York by analyzing over 1 billion records using apache spark, hadoop file system and scala.
I'll walk you through launching a cluster manually using Spark standalone deploy mode, as well as connecting an app to the cluster, launching the app, where to view the monitoring and logging.
Spark submit extension from bde2020/spark-submit for Scala with SBT
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) and Terraform to setup the infrastructure🥊
Start clusters in virtualbox VMs
To facilitate the initial setup of Apache Spark, this repository provides a beginner-friendly, step-by-step guide on setting up a master node and two worker nodes.
In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.
Add a description, image, and links to the spark-cluster topic page so that developers can more easily learn about it.
To associate your repository with the spark-cluster topic, visit your repo's landing page and select "manage topics."