Apache Spark cluster in Docker - https://hub.docker.com/r/giabar/gb-spark/
-
Updated
Nov 1, 2018 - Shell
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Apache Spark cluster in Docker - https://hub.docker.com/r/giabar/gb-spark/
GCP Dataproc mapreduce sample with PySpark
Raspberrypi 4 based hadoop cluster with Spark
CentOS based container with a standalone SPARK installation to work with larger-than-RAM datasets.
Ubuntu base image provisioned mainly with Docker and Java
Exploring details of Motor Vehicle Collisions in New York City provided by the Police Department (NYPD).
This repository contain simple Spark application for beginners
Welcome to my self-learning journey!
Workshop Material for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop at the Open Data Science Conference WEST 2019
Deploy apache spark in client mode on Kubernetes cluster, integrate with Jupyter notebook through Jupyterhub server.
Script and tools to build with Apache Bigtop
First basic Big Data approach
📘 FIWARE 306: Real-time Processing of Context Data using Apache Spark
This project builds a data pipeline to populate the user_behavior_metric table. The user_behavior_metric table is an OLAP table, meant to be used by analysts, dashboarding.
Examples of using Apache Spark MLlib Pipelines and Structured Streaming on version 2.4.0
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
Created by Matei Zaharia
Released May 26, 2014