Welcome to my self-learning journey!
-
Updated
Apr 21, 2024 - Shell
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Welcome to my self-learning journey!
A curated list of awesome Apache Spark packages and resources.
📘 FIWARE 306: Real-time Processing of Context Data using Apache Spark
A rudimentary command line utility for contrasting Apache Spark event logs.
Apache Spark docker image
Host files and procedure for running Fink on Kubernetes
Driver/Executor images for spark-operator
Deploy apache spark in client mode on Kubernetes cluster, integrate with Jupyter notebook through Jupyterhub server.
This project builds a data pipeline to populate the user_behavior_metric table. The user_behavior_metric table is an OLAP table, meant to be used by analysts, dashboarding.
A .NET for Apache Spark docker image (3rdman/dotnet-spark)
The implementation of Apache Spark (combine with PySpark, Jupyter Notebook) on top of Hadoop cluster using Docker
[PROJECT IS NO LONGER MAINTAINED] Wirbelsturm is a Vagrant and Puppet based tool to perform 1-click local and remote deployments, with a focus on big data tech like Kafka.
CentOS based container with a standalone SPARK installation to work with larger-than-RAM datasets.
🤠
Production run of Apache Spark on Kubernetes
demo of running apache spark jobs using tekton and s2i workflows
Apache Spark cluster with docker-swarm prometheus cadvisor
Created by Matei Zaharia
Released May 26, 2014