Skip to content
#

dataproc

Here are 38 public repositories matching this topic...

ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau

  • Updated Mar 9, 2022
  • Python

Creating a robust and scalable data pipeline on Google Cloud Platform (GCP) to monitor and analyze stock performance. Leveraging the power of GCP's data processing and storage services, a comprehensive solution has been built to efficiently collect, process, and visualize stock data.

  • Updated Sep 7, 2023
  • Python

Collected data about from three sources, one opinion-based social media in twitter, research data in New York Times, and the third is the common crawl data for the same topic or key phrase, and from similar time periods. Processed the three data sets collected individually using classical big data methods like Map Reduce in Google Dataproc Clust…

  • Updated Oct 25, 2019
  • Python

Improve this page

Add a description, image, and links to the dataproc topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataproc topic, visit your repo's landing page and select "manage topics."

Learn more