spark-cluster

Star

Here are 24 public repositories matching this topic...

shuaicj / spark-cluster

Star

A spark cluster based on docker-compose.

docker spark docker-compose spark-cluster

Updated Mar 23, 2018
Shell

kumarvna / terraform-azurerm-hdinsight

Sponsor

Star

Terraform module to create managed, full-spectrum, open-source analytics service Azure HDInsight. This module creates Apache Hadoop, Apache Spark, Apache HBase, Interactive Query (Apache Hive LLAP) and Apache Kafka clusters.

azure terraform spark-clusters hadoop-cluster azure-hdinsight hadoop-filesystem kafka-cluster spark-cluster terraform-module hadoop-hdfs hdinsight-cluster hbase-cluster hdinsight-hadoop-cluster hdinsight-hbase-cluster hdinsight-interactive-query-cluster hdinsight-kafka-cluster hdinsight-spark-cluster apache-hive-cluster

Updated Jun 8, 2022
HCL

mgarralda / hadoop-spark-cluster

Star

Repository containing Docker images for create a cluster Spark on Hadoop Yarn.

spark spark-cluster hadoop-hdfs spark-hadoop spark-hadoop-docker spark-yarn-docker

Updated Sep 10, 2023
Dockerfile

minhky2185 / healthcare_data_pipeline

Star

An end-to-end data pipeline for building Data Lake and supporting report using Apache Spark.

visualization mysql data big-data spark apache-spark analytics postgresql s3 data-engineering data-lake powerbi emr-cluster spark-cluster data-engineering-pipeline healthcare-data rds-mysql rds-postgres

Updated Jan 31, 2023
Python

shuaicj / spark-cluster-zk

Star

A spark cluster containing multiple spark masters based on docker-compose.

docker spark docker-compose zookeeper spark-cluster spark-master

Updated Mar 23, 2018
Shell

ansjin / docker-spark

Star

docker spark standalone

docker spark docker-compose jupyter-notebook python3 pyspark pyspark-notebook spark-cluster spark-standalone

Updated Jul 8, 2019
Dockerfile

vaibhavmagon / Spark-Python-MovieReviews

Star

Script to run and find similarities between movies from a movie lens data set using Python & Spark Clustering.

python spark movie dataset recommendation-system easy-to-use movielens-dataset spark-cluster

Updated Sep 30, 2020
Python

RammySekham / spark-kb

Star

Spark standalone architecture, local architecture and reading hadoop file formats i.e. avro, parquet and ORC

yarn local standalone pyspark hdfs spark-cluster pysaprk-sql hadoop-file-formats

Updated Jan 4, 2021
Jupyter Notebook

aimanamri / raspberry-pi4-hadoop-spark-cluster

Star

This is a self-documentation of learning distributed data storage, parallel processing, and Linux OS using Apache Hadoop, Apache Spark and Raspbian OS. In this project, 3-node cluster will be setup using Raspberry Pi 4, install HDFS and run Spark processing jobs via YARN.

big-data yarn pyspark hdfs distributed-storage hadoop-cluster parallel-processing spark-shell spark-cluster raspberry-pi-4

Updated Jul 13, 2024
Shell

itsayushthada / SVD-on-Spark

Star

kubernetes-cluster pyspark pyspark-notebook spark-cluster

Updated Jul 11, 2021
Jupyter Notebook

harshkavdikar1 / GeoSpatial-DataAnalysis-With-Spark

Star

A distributed application to identify top 50 taxi pickup locations in New York by analyzing over 1 billion records using apache spark, hadoop file system and scala.

scala spark ec2-instance spark-sql spark-cluster hot-cell-analysis hot-zone-analysis

Updated May 6, 2020
Scala

dazayzeh / Installing-spark-standalone-to-a-cluster-manually

Star

I'll walk you through launching a cluster manually using Spark standalone deploy mode, as well as connecting an app to the cluster, launching the app, where to view the monitoring and logging.

spark-cluster spark-standalone spark-monitor

Updated Jul 28, 2020

euiyounghwang / spark_job_interface_service

Star

spark_job_interface_service

spark spark-jobs spark-cluster fastapi

Updated Jun 13, 2024
Python

flaviostutz / spark-submit-scala

Star

Spark submit extension from bde2020/spark-submit for Scala with SBT

scala spark sbt bigdata spark-submit spark-cluster

Updated Apr 13, 2020
Scala

SinghHarshita / Clustering-Algorithms-Spark

Star

KMeans, Cure and Canpoy algorithms are demonstrated using Pyspark.

machine-learning big-data spark apache-spark clustering mapreduce kmeans clustering-algorithm cure canopy big-data-analytics spark-cluster mapreduce-python

Updated May 19, 2021
Jupyter Notebook

silencebingo / hadoop-spark-cluster

Star

A Hadoop and Spark Cluster on Docker

hadoop-cluster spark-cluster

Updated Apr 12, 2018
Shell

longNguyen010203 / Spark-Processing-AWS

Star

👷🌇 Set up and build a big data processing pipeline with Apache Spark, 📦 AWS services (S3, EMR, EC2, IAM, VPC, Redshift) Terraform to setup the infrastructure and Integration Airflow to automate workflows🥊

aws apache-spark terraform aws-s3 iam pyspark cloud-computing aws-ec2 redshift data-pipeline aws-services apache-airflow emr-cluster spark-cluster spark-master spark-worker

Updated Jul 12, 2024
Python

pientaa / opening-black-box

Star

Deep dive into Spark UDFs' characteristics.

black-box scala spark cluster udf spark-cluster

Updated Sep 14, 2021
Jupyter Notebook

DanMolenhouse / Distributed-Systems-Project5-Hadoop-and-Spark

Star

In this project, we used both Hadoop / MapReduce and Spark to do distributed computing. The first task was to perform a series of operations using a Mapper and Reduce java file that was implemented on a Hadoop server. The second task was to perform similar operations, but on Spark instead.

spark apache-spark hadoop hadoop-cluster mapreduce hadoop-mapreduce spark-cluster mapreduce-java hadoop-hdfs

Updated Oct 31, 2022
Java

karamolegkos / Diastema

Star

This is my contribution in the project Diastema

api kubernetes spark kubernetes-api openstack-heat spark-cluster spark-on-kubernetes microstack diastema openstack-heat-api

Updated Sep 2, 2022
Python

Improve this page

Add a description, image, and links to the spark-cluster topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-cluster topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-cluster

Here are 24 public repositories matching this topic...

shuaicj / spark-cluster

kumarvna / terraform-azurerm-hdinsight

mgarralda / hadoop-spark-cluster

minhky2185 / healthcare_data_pipeline

shuaicj / spark-cluster-zk

ansjin / docker-spark

vaibhavmagon / Spark-Python-MovieReviews

RammySekham / spark-kb

aimanamri / raspberry-pi4-hadoop-spark-cluster

itsayushthada / SVD-on-Spark

harshkavdikar1 / GeoSpatial-DataAnalysis-With-Spark

dazayzeh / Installing-spark-standalone-to-a-cluster-manually

euiyounghwang / spark_job_interface_service

flaviostutz / spark-submit-scala

SinghHarshita / Clustering-Algorithms-Spark

silencebingo / hadoop-spark-cluster

longNguyen010203 / Spark-Processing-AWS

pientaa / opening-black-box

DanMolenhouse / Distributed-Systems-Project5-Hadoop-and-Spark

karamolegkos / Diastema

Improve this page

Add this topic to your repo