A curated list of awesome Apache Spark packages and resources.
-
Updated
Apr 8, 2024 - Shell
A curated list of awesome Apache Spark packages and resources.
The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.
Backbone for the MorphL-Community-Edition platform.
Local integration test setup for pyspark with AWS through Localstack
Serverless PySpark
Vagrant Box with Python 3.6.1, Apache Spark 2.1.1 with Scala 2.11.8 and PySpark (2.1.1).
P.O.C Spark On Kubernetes
The Data Pipeline using Google Cloud Dataproc, Cloud Storage and BigQuery
Installation instructions for pyspark and a kernel with jupyter
Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect
Scripts for provisioning data science tools
Scalable Spark Docker image that can works on Docker Compose and Kubernetes
Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One
🐳 Docker container for Spark on college (HHS).
GCP Dataproc mapreduce sample with PySpark
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."