Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One
-
Updated
Aug 21, 2021 - Shell
Projeto do Curso "Criando um Ecossistema Hadoop Totalmente Gerenciado com Google Cloud Dataproc" do Bootcamp Data Engineer da Digital Innovation One
🐳 Docker container for Spark on college (HHS).
GCP Dataproc mapreduce sample with PySpark
Building a Spark standalone cluster with Docker
Guide to installing a Hadoop and Spark on an Oracle virtual machine.
We build a Forex-currency rates pipeline to get currency rates from an external API and load the data into HDFS from where we use pyspark job to massage the data and insert it into a Hive table. The objective of this pipeline is to get the data ready for any downstream machine learning pipeline.
Data Warehouse Project - TPC-DS benchmarking on Spark SQL 👨🏻💻
Driver/Executor images for spark-operator
Hands-on workshop with Apache Iceberg
Setting up Data Pipeline in AWS using AWS Data Pipeline, S3 and EMR
GCP_Data_Enginner
Creating gcloud dataproc cluster with this github action
Vagrant Box with Python 3.6.1, Apache Spark 2.1.1 with Scala 2.11.8 and PySpark (2.1.1).
P.O.C Spark On Kubernetes
Add a description, image, and links to the pyspark topic page so that developers can more easily learn about it.
To associate your repository with the pyspark topic, visit your repo's landing page and select "manage topics."