GCP Dataproc mapreduce sample with PySpark
-
Updated
Aug 9, 2018 - Shell
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
GCP Dataproc mapreduce sample with PySpark
Raspberrypi 4 based hadoop cluster with Spark
Apache Spark cluster in Docker - https://hub.docker.com/r/giabar/gb-spark/
CentOS based container with a standalone SPARK installation to work with larger-than-RAM datasets.
Ubuntu base image provisioned mainly with Docker and Java
Exploring details of Motor Vehicle Collisions in New York City provided by the Police Department (NYPD).
This repository contain simple Spark application for beginners
Script and tools to build with Apache Bigtop
Workshop Material for Near RealTime Predictive Analytics with Apache Spark Structured Streaming Workshop at the Open Data Science Conference WEST 2019
Deploy apache spark in client mode on Kubernetes cluster, integrate with Jupyter notebook through Jupyterhub server.
First basic Big Data approach
Examples of using Apache Spark MLlib Pipelines and Structured Streaming on version 2.4.0
📘 FIWARE 306: Real-time Processing of Context Data using Apache Spark
This project builds a data pipeline to populate the user_behavior_metric table. The user_behavior_metric table is an OLAP table, meant to be used by analysts, dashboarding.
Edge2AI Workshop
Created by Matei Zaharia
Released May 26, 2014