Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform
-
Updated
Feb 29, 2020 - HCL
Dataproc Customisable HA cluster debian-9 with zookeeper,kafka ,BigQuery and other tools/jobs with Terraform
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Monte Carlo stock simulation using Apache Spark.
Implements a work queue for Dataproc Worflow Template executions
Project for Cloud Computing course (A.Y. 2018/2019)
Data is fetched from StackExchange, transformed using Pig, queried and stored in Hive. Additionally, the TF-IDF of the top 10 users is calculated using Hive.
Apache spark sandbox on GCP and Amazon EMR.
First project for Big Data course held at Roma Tre University
Hadoop Google DataProc DIO study
Project for Scalable and Cloud Programming Course - 2018/19 UNIBO.
Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.
Add a description, image, and links to the gcp-dataproc topic page so that developers can more easily learn about it.
To associate your repository with the gcp-dataproc topic, visit your repo's landing page and select "manage topics."