#

pyspark

Here are 35 public repositories matching this topic...

jonathanAmancioSales / Hadoop_Dataproc_Google_Cloud_Platform_DIO

Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One

hadoop google-cloud pyspark dataproc-cluster google-cloud-dataproc

Updated Aug 21, 2021
Shell

timvisee / hhs-p7-spark-docker

🐳 Docker container for Spark on college (HHS).

docker spark docker-container jupyter-notebook pyspark college

Updated May 1, 2017
Shell

redvg / dataproc-pyspark-mapreduce

GCP Dataproc mapreduce sample with PySpark

python spark apache-spark hadoop pyspark dataproc-clusters

Updated Aug 9, 2018
Shell

LSSTDESC / desc-spark

Apache Spark for the LSST DESC

apache-spark tutorials pyspark desc

Updated Oct 12, 2021
Shell

divya-anand21 / customer_demographic_hive_analysis

SQL Hive Analaysis on (Adventure Works)dataset

sql hive python3 pyspark sqoop

Updated Mar 22, 2022
Shell

jcchac / spark-standalone-cluster-on-docker

Building a Spark standalone cluster with Docker

python docker r scala spark docker-compose pyspark jupyterlab

Updated Apr 1, 2021
Shell

chimera-suite / jupyter-notebook

Chimera Jupyter Notebook is an extension of the juptypter/pyspark-notebook image that integrates PySPARQL

python docker sparql jupyter-notebook pyspark chimera pysparql

Updated Dec 22, 2020
Shell

seunggihong / hadoop-install-guide

Guide to installing a Hadoop and Spark on an Oracle virtual machine.

spark hadoop virtualbox pyspark hadoop-cluster

Updated Mar 20, 2024
Shell

andgineer / spark-aws-rdkit

Docker image with Apache Spark / Hadoop3 (compatible with AWS services like S3) and with RDKit installed in anaconda environment

aws spark anaconda pyspark rdkit

Updated Mar 19, 2024
Shell

slatawa / Forex-Currency-Processing-Airflow-Hdfs-Hive-Spark

We build a Forex-currency rates pipeline to get currency rates from an external API and load the data into HDFS from where we use pyspark job to massage the data and insert it into a Hive table. The objective of this pipeline is to get the data ready for any downstream machine learning pipeline.

docker airflow hive pyspark hdfs hiveql

Updated Jul 30, 2021
Shell

mohammadzainabbas / data-warehouse-project

Data Warehouse Project - TPC-DS benchmarking on Spark SQL 👨🏻‍💻

spark python3 pyspark bash-script tpc-ds-benchmark tpc-ds-queries tpc-ds

Updated Dec 15, 2023
Shell

akhlakm / dcs-cluster

On-premise Distributed Computing and Storage cluster deployment using Ansible and Docker.

docker ansible pyspark moosefs

Updated Sep 25, 2022
Shell

debugger24 / spark-on-k8s-images

Driver/Executor images for spark-operator

kubernetes spark apache-spark executor driver pyspark spark-operator

Updated Nov 29, 2022
Shell

tlepple / iceberg-intro-workshop

Hands-on workshop with Apache Iceberg

linux big-data spark dell pyspark minio spark-streaming object-storage spark-sql apache-iceberg spark-sql-s3 dell-object-storage

Updated Mar 13, 2024
Shell

aaa121 / Big-Data-Analytics

python r scala sql hive pyspark pig sparkr hadoop-filesystem hadoop-mapreduce

Updated Jul 22, 2017
Shell

Anusha-GK / AWS-Data-Pipeline

Setting up Data Pipeline in AWS using AWS Data Pipeline, S3 and EMR

aws-emr pyspark emractivity pipeline-definition

Updated Sep 30, 2022
Shell

anjijava16 / GCP_Data_Enginner_Utils

GCP_Data_Enginner

python bigquery scala notebook gcp pubsub pyspark dataflow shell-script dataproc-cluster dataproc gcp-storage big-data-processing

Updated Sep 4, 2021
Shell

vishnudxb / gcloud-dataproc-creation

Creating gcloud dataproc cluster with this github action

testing big-data google-cloud pyspark spark-streaming dataproc-cluster

Updated Oct 18, 2020
Shell

kadnan / vagrant-spark2

Vagrant Box with Python 3.6.1, Apache Spark 2.1.1 with Scala 2.11.8 and PySpark (2.1.1).

vagrant spark vagrant-boxes python3 pyspark

Updated Jun 11, 2017
Shell

Thelin90 / deiteo

P.O.C Spark On Kubernetes

docker kubernetes spark kubernetes-cluster python3 pyspark minikube minikube-cluster kubectl pipenv spark-structured-streaming

Updated Feb 18, 2021
Shell

Improve this page

Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."