#

pyspark

Here are 35 public repositories matching this topic...

awesome-spark / awesome-spark

A curated list of awesome Apache Spark packages and resources.

awesome apache-spark pyspark sparkr

Updated Apr 8, 2024
Shell

radanalyticsio / oshinko-s2i

This is a place to put s2i images and utilities for spark application builders for openshift

java scala spark openshift pyspark s2i-image oshinko-s2i

Updated Apr 28, 2021
Shell

dimajix / docker-jupyter-spark

Docker image for Jupyter notebooks with PySpark

python docker spark hadoop jupyter pyspark

Updated Aug 3, 2018
Shell

mrugankray / Big-Data-Cluster

The goal of this project is to build a docker cluster that gives access to Hadoop, HDFS, Hive, PySpark, Sqoop, Airflow, Kafka, Flume, Postgres, Cassandra, Hue, Zeppelin, Kadmin, Kafka Control Center and pgAdmin. This cluster is solely intended for usage in a development environment. Do not use it to run any production workloads.

airflow kafka spark cassandra hive hadoop schema-registry postgresql python3 pyspark hdfs flume hue zeppelin pgadmin4 kadmin sqoop conda-environment control-center

Updated Feb 27, 2023
Shell

Morphl-AI / MorphL-Orchestrator

Backbone for the MorphL-Community-Edition platform.

kubernetes machine-learning pipeline pyspark hdfs ubuntu1604 keras-tensorflow cassandra-database morphl-platform

Updated Nov 26, 2019
Shell

richardcann / spark-integration-localstack

Local integration test setup for pyspark with AWS through Localstack

spark hadoop aws-s3 pyspark localstack

Updated Jan 5, 2022
Shell

Didone / sam-pyspark

Serverless PySpark

aws lambda spark aws-lambda serverless lambda-functions pyspark aws-sam

Updated Mar 10, 2020
Shell

malihasameen / sales-streaming

End to End Sales Streaming Pipeline (FastAPI, Kafka, Spark, Cassandra, MySQL, Superset)

mysql kafka cassandra docker-compose superset pyspark fastapi

Updated May 26, 2023
Shell

kadnan / vagrant-spark2

Vagrant Box with Python 3.6.1, Apache Spark 2.1.1 with Scala 2.11.8 and PySpark (2.1.1).

vagrant spark vagrant-boxes python3 pyspark

Updated Jun 11, 2017
Shell

Thelin90 / deiteo

P.O.C Spark On Kubernetes

docker kubernetes spark kubernetes-cluster python3 pyspark minikube minikube-cluster kubectl pipenv spark-structured-streaming

Updated Feb 18, 2021
Shell

bozzlab / pyspark-dataproc-gcs-to-bigquery

The Data Pipeline using Google Cloud Dataproc, Cloud Storage and BigQuery

python bigquery cloud-storage pyspark google-cloud-platform dataproc

Updated Feb 27, 2021
Shell

sankamuk / aws-kinesis-redshift-sparkstream

Spark Structured Streaming from AWS Kinesis and Redshift

aws spark terraform kinesis pyspark redshift structured-streaming

Updated Aug 15, 2021
Shell

jonathanAmancioSales / Semantix_Academy

Exercícios desenvolvidos nos cursos de BigData Cience e BigData Engineer da Semantix Academy

redis elasticsearch kafka spark mongodb hive hadoop bigdata hbase pyspark elastic sqoop

Updated May 6, 2022
Shell

unnitin / pyspark-jupyter-kernel

Installation instructions for pyspark and a kernel with jupyter

helper tutorial spark jupyter installer pyspark jupyter-notebooks installer-script pyspark-notebook

Updated Feb 5, 2019
Shell

tlepple / data_origination_workshop

Hands-on workshop with Iceberg, Redpanda, Debezium and Kafka-Connect

python postgresql pyspark minio spark-streaming kafka-connect iceberg debezium redpanda apache-iceberg debeziumkafkaconnector redpanda-console

Updated Mar 15, 2024
Shell

HerveMignot / CloudProvisioning

Scripts for provisioning data science tools

emr ec2 jupyter rstudio shiny-server pyspark jupyterhub jupyterlab

Updated May 26, 2018
Shell

mpolatcan / spark-docker

Scalable Spark Docker image that can works on Docker Compose and Kubernetes

docker kubernetes dockerfile spark scalable docker-compose docker-image pyspark

Updated Nov 16, 2020
Shell

jonathanAmancioSales / Hadoop_Dataproc_Google_Cloud_Platform_DIO

Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One

hadoop google-cloud pyspark dataproc-cluster google-cloud-dataproc

Updated Aug 21, 2021
Shell

timvisee / hhs-p7-spark-docker

🐳 Docker container for Spark on college (HHS).

docker spark docker-container jupyter-notebook pyspark college

Updated May 1, 2017
Shell

redvg / dataproc-pyspark-mapreduce

GCP Dataproc mapreduce sample with PySpark

python spark apache-spark hadoop pyspark dataproc-clusters

Updated Aug 9, 2018
Shell

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."