A spark cluster based on docker-compose.
-
Updated
Mar 23, 2018 - Shell
A spark cluster based on docker-compose.
Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.
Repository containing Docker images for create a cluster Spark on Hadoop Yarn.
An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.
A spark cluster containing multiple spark masters based on docker-compose.
docker spark standalone
Script to run and find similarities between movies from a movie lens data set using Python & Spark Clustering.
Spark standalone architecture, local architecture and reading hadoop file formats i.e. avro, parquet and ORC
This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.
A distributed application to identify top 50 taxi pickup locations in New York by analyzing over 1 billion records using apache spark, hadoop file system and scala.
I'll walk you through launching a cluster manually using Spark standalone deploy mode, as well as connecting an app to the cluster, launching the app, where to view the monitoring and logging.
Spark submit extension from bde2020/spark-submit for Scala with SBT
KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.
👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊
In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.
This is my contribution in the project Diastema
Add a description, image, and links to the spark-cluster topic page so that developers can more easily learn about it.
To associate your repository with the spark-cluster topic, visit your repo's landing page and select "manage topics."