Skip to content
#

spark-streaming

Here are 271 public repositories matching this topic...

ApacheSpark

This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which we need in our real life experience as a data engineer. We will be using pyspark & sparksql for the development. At the end of the course we also cover few case studies.

  • Updated Dec 28, 2023
  • Python

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

  • Updated Jul 25, 2024
  • Python

Python script demonstrating spark streaming and Kafka implementation using an e-commerce website like product recommendation engine based on item-based collaborative filtering. 🐍. 💥

  • Updated Oct 21, 2021
  • Python

This is a data processing pipeline that implements an End-to-End Real-Time Geospatial Analytics and Visualization multi-component full-stack solution, using Apache Spark Structured Streaming, Apache Kafka, MongoDB Change Streams, Node.js, React, Uber's Deck.gl and React-Vis, and using the Massachusetts Bay Transportation Authority's (MBTA) APIs …

  • Updated Dec 11, 2022
  • Python

Improve this page

Add a description, image, and links to the spark-streaming topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the spark-streaming topic, visit your repo's landing page and select "manage topics."

Learn more